diff mbox

[v4,18/37] target: Make ABORT and LUN RESET handling synchronous

Message ID 20170208222507.25715-19-bart.vanassche@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche Feb. 8, 2017, 10:24 p.m. UTC
Instead of invoking target driver callback functions from the
context that handles an abort or LUN RESET task management function,
only set the abort flag from that context and perform the actual
abort handling from the context of the regular command processing
flow. This approach has the following advantages:
- The task management function code becomes much easier to read and
  to verify since the number of potential race conditions against
  the command processing flow is strongly reduced.
- It is no longer needed to store the command state into the command
  itself since that information is no longer needed from the context
  where a task management function is processed.

Notes:
- With this patch applied the CMD_T_ABORTED flag is checked at two points
  by the target core: just before local execution of a command starts
  (see also target_execute_cmd()) and also just before the response is
  sent (see also target_complete_cmd()).
- A new reference count is introduced that tracks the number of
  references held by the target driver, namely se_cmd.tgt_ref.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
---
 drivers/target/target_core_internal.h  |   3 +-
 drivers/target/target_core_tmr.c       |  65 +++++-------
 drivers/target/target_core_transport.c | 183 ++++++++++++++++-----------------
 include/target/target_core_base.h      |   1 +
 4 files changed, 117 insertions(+), 135 deletions(-)

Comments

Nicholas A. Bellinger Feb. 9, 2017, 10:43 a.m. UTC | #1
On Wed, 2017-02-08 at 14:24 -0800, Bart Van Assche wrote:
> Instead of invoking target driver callback functions from the
> context that handles an abort or LUN RESET task management function,
> only set the abort flag from that context and perform the actual
> abort handling from the context of the regular command processing
> flow. This approach has the following advantages:
> - The task management function code becomes much easier to read and
>   to verify since the number of potential race conditions against
>   the command processing flow is strongly reduced.
> - It is no longer needed to store the command state into the command
>   itself since that information is no longer needed from the context
>   where a task management function is processed.
> 
> Notes:
> - With this patch applied the CMD_T_ABORTED flag is checked at two points
>   by the target core: just before local execution of a command starts
>   (see also target_execute_cmd()) and also just before the response is
>   sent (see also target_complete_cmd()).
> - A new reference count is introduced that tracks the number of
>   references held by the target driver, namely se_cmd.tgt_ref.
> 

To reiterate the way existing code functions for first and second order
handling, here is the text from the earlier -v2 review to keep the
context fresh.

The quiesce of in-flight se_cmd can happen in three ways:

  - Early submission in target_execute_cmd() to catch CMD_T_STOP
    before se_cmd is submitted to the backend, or
  - waiting for outstanding backend I/O to complete via
    target_complete_cmd() to catch CMD_T_STOP (typical case when a 
    backend device is no responding), or
  - waiting for transport_cmd_check_stop() hand-off of se_cmd from
    target-core to fabric driver response to catch CMD_T_STOP.

This quiesce can happen for se_cmd with ABORT_TASK in
core_tmr_abort_task(), with LUN_RESET it happen for TMRs in
core_tmr_drain_tmr_list() and se_cmd descriptors in
core_tmr_drain_state_list() as part of se_device->state_list.

Once CMD_T_STOP is cleared, target_core_tmr.c will drop both the
outstanding references to se_cmd->cmd_kref, as well as it's own
reference obtained by __target_check_io_state().

The second order issue, which is what complicates things with the
existing code, is that any of the above first order cases can occur and
block waiting for CMD_T_STOP to be cleared at the _same time_ while the
associated fabric driver se_session is being shutdown, and sets
CMD_T_FABRIC_STOP on all the se_session's se_cmd descriptors.

The session shutdown can happen explicitly by the fabric driver, or by a
session reinstatement event (with iscsi-target for example) when a host
doesn't get a TMR response due to backend I/O not being completed for an
extended amount of time.

The dynamic interaction of the first and second order issues here is
what has taken us a long time to get right in upstream code, and which
has been stable for end-users the past year since the advent of commit
0f4a94316.

So with that additional context about how/why things work as they do
today in mainline code, my comments are inline below. 

> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Grover <agrover@redhat.com>
> Cc: David Disseldorp <ddiss@suse.de>
> ---
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 161c8b8482db..980d94277c8d 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -192,13 +173,13 @@ void core_tmr_abort_task(
>  			continue;
>  		}
>  
> +		se_cmd->send_abort_response = false;
>  		list_del_init(&se_cmd->se_cmd_list);
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  
> -		cancel_work_sync(&se_cmd->work);
> -		transport_wait_for_tasks(se_cmd);
> +		while (!wait_for_completion_timeout(&se_cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, se_cmd);
>  

This is still fundamentally wrong, because it allows se_cmd to be
dispatched into the backend driver, and only just waits for completion
from the backend after the fact.

Keep in mind existing transport_wait_for_tasks() code sets CMD_T_STOP
and blocks on se_cmd->t_transport_stop_cmd(), so target_execute_cmd()
catches CMD_T_STOP and prevents the backend se_cmd->execute_cmd()
execution from actually occurring.

Whatever changes you are proposing needs to maintain this logic.

> -		transport_cmd_finish_abort(se_cmd, true);
>  		__target_put_sess_cmd(se_cmd);
>  
>  		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
> @@ -295,14 +276,30 @@ static void core_tmr_drain_tmr_list(
>  			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
>  			tmr_p->function, tmr_p->response, cmd->t_state);
>  
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		transport_cmd_finish_abort(cmd, 1);
> +		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, cmd);
>  		__target_put_sess_cmd(cmd);
>  	}
>  }

Likewise here, this doesn't actually prevent TMR execution, it just
waits for it to complete after the fact.  

>  static void core_tmr_drain_state_list(
>  	struct se_device *dev,
>  	struct se_cmd *prout_cmd,
> @@ -311,6 +308,7 @@ static void core_tmr_drain_state_list(
>  	struct list_head *preempt_and_abort_list)
>  {
>  	LIST_HEAD(drain_task_list);
> +	struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL;
>  	struct se_session *sess;
>  	struct se_cmd *cmd, *next;
>  	unsigned long flags;
> @@ -365,6 +363,8 @@ static void core_tmr_drain_state_list(
>  
>  		list_move_tail(&cmd->state_list, &drain_task_list);
>  		cmd->state_active = false;
> +		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
> +			cmd->send_abort_response = true;
>  	}
>  	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
>  
> @@ -387,17 +387,8 @@ static void core_tmr_drain_state_list(
>  			(cmd->transport_state & CMD_T_STOP) != 0,
>  			(cmd->transport_state & CMD_T_SENT) != 0);
>  
> -		/*
> -		 * If the command may be queued onto a workqueue cancel it now.
> -		 *
> -		 * This is equivalent to removal from the execute queue in the
> -		 * loop above, but we do it down here given that
> -		 * cancel_work_sync may block.
> -		 */
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		core_tmr_handle_tas_abort(cmd, tas);
> +		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, cmd);
>  		__target_put_sess_cmd(cmd);

And here again too.  You can't just dispatch aborted commands to the
backend and wait until they complete after the fact.

If the command is requesting to being CMD_T_ABORTED,
target_execute_cmd() must catch the command's CMD_T_STOP before it's
dispatched into the backend.

What if the backend is not responding for a long time in the first place
that caused the host to issue LUN_RESET..?

Why would it make sense to keep flooding the device with new commands..?

>  	}
>  }
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 533b584c2806..aac1fc1b2a7c 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -714,16 +739,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  			success = 1;
>  	}
>  
> -	/*
> -	 * Check for case where an explicit ABORT_TASK has been received
> -	 * and transport_wait_for_tasks() will be waiting for completion..
> -	 */
> -	if (cmd->transport_state & CMD_T_ABORTED ||
> -	    cmd->transport_state & CMD_T_STOP) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		complete_all(&cmd->t_transport_stop_comp);
> -		return;
> -	} else if (!success) {

This can't be removed because __transport_wait_for_tasks() needs to
catch the CMD_T_STOP or CMD_T_ABORTED quiesce here.

Otherwise for !success, the target_completes_failure_work() executes and
actually queues the response, but only CMD_T_STOP (but not
CMD_T_ABORTED !) is checked transport_cmd_check_stop_to_fabric() after
the response has been dispatched.

> +	if (!success) {
>  		INIT_WORK(&cmd->work, target_complete_failure_work);
>  	} else {
>  		INIT_WORK(&cmd->work, target_complete_ok_work);
> @@ -733,6 +749,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
>  	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>  
> +queue_work:
>  	if (cmd->se_cmd_flags & SCF_USE_CPUID)
>  		queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
>  	else
> @@ -1215,6 +1232,7 @@ void transport_init_se_cmd(
>  	INIT_LIST_HEAD(&cmd->state_list);
>  	init_completion(&cmd->t_transport_stop_comp);
>  	init_completion(&cmd->cmd_wait_comp);
> +	init_completion(&cmd->complete);
>  	spin_lock_init(&cmd->t_state_lock);
>  	kref_init(&cmd->cmd_kref);
>  	atomic_set(&cmd->tgt_ref, 1);
> @@ -1886,6 +1904,7 @@ void target_execute_cmd(struct se_cmd *cmd)
>  	spin_lock_irq(&cmd->t_state_lock);
>  	if (__transport_check_aborted_status(cmd, 1)) {
>  		spin_unlock_irq(&cmd->t_state_lock);
> +		transport_handle_abort(cmd);
>  		return;
>  	}

__transport_check_aborted_status() must honor delayed TAS, and allow the
fabric to complete WRITE transfer before proceeding with abort.

There is assumptions about this all over driver code.

>  	if (cmd->transport_state & CMD_T_STOP) {
> @@ -2494,15 +2513,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
>  
>  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  {
> -	int ret = 0;
>  	bool aborted = false, tas = false;
>  
>  	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
>  		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	} else {
>  		if (wait_for_tasks)
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> @@ -2516,23 +2531,12 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  
>  		if (cmd->se_lun)
>  			transport_lun_remove_cmd(cmd);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	}
> -	/*
> -	 * If the task has been internally aborted due to TMR ABORT_TASK
> -	 * or LUN_RESET, target_core_tmr.c is responsible for performing
> -	 * the remaining calls to target_put_sess_cmd(), and not the
> -	 * callers of this function.
> -	 */
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
>  		wait_for_completion(&cmd->cmd_wait_comp);
> -		transport_put_cmd(cmd);
> -		ret = 1;
>  	}
> -	return ret;
> +	return transport_put_cmd(cmd);

As mentioned in patch #14, there is no way this can possibly work.

The wait_for_completion(&cmd->cmd_wait_comp) is woken up when
transport_put_cmd() -> kref_put() of se_cmd->cmd_kref reaches zero, so
there is no way the wait_for_completion during CMD_T_ABORTED can ever
make forward progress.

>  }
>  EXPORT_SYMBOL(transport_generic_free_cmd);
>  
> @@ -2644,6 +2648,43 @@ int target_put_sess_cmd(struct se_cmd *se_cmd)
>  }
>  EXPORT_SYMBOL(target_put_sess_cmd);
>  
> +static const char *data_dir_name(enum dma_data_direction d)
> +{
> +	switch (d) {
> +	case DMA_BIDIRECTIONAL:	return "BIDI";
> +	case DMA_TO_DEVICE:	return "WRITE";
> +	case DMA_FROM_DEVICE:	return "READ";
> +	case DMA_NONE:		return "NONE";
> +	default:		return "(?)";
> +	}
> +}
> +
> +static const char *cmd_state_name(enum transport_state_table t)
> +{
> +	switch (t) {
> +	case TRANSPORT_NO_STATE:	return "NO_STATE";
> +	case TRANSPORT_NEW_CMD:		return "NEW_CMD";
> +	case TRANSPORT_WRITE_PENDING:	return "WRITE_PENDING";
> +	case TRANSPORT_PROCESSING:	return "PROCESSING";
> +	case TRANSPORT_COMPLETE:	return "COMPLETE";
> +	case TRANSPORT_ISTATE_PROCESSING:
> +					return "ISTATE_PROCESSING";
> +	case TRANSPORT_COMPLETE_QF_WP:	return "COMPLETE_QF_WP";
> +	case TRANSPORT_COMPLETE_QF_OK:	return "COMPLETE_QF_OK";
> +	default:			return "?";
> +	}
> +}
> +
> +void target_show_cmd(const char *ctxt, struct se_cmd *cmd)
> +{
> +	pr_debug("%s: still waiting for %s with tag %#llx; dir %s i_state %d t_tate %s len %d tgt_ref %d refcnt %d\n",
> +		 ctxt, cmd->se_cmd_flags & SCF_SCSI_TMR_CDB ? "tmf" : "cmd",
> +		 cmd->tag, data_dir_name(cmd->data_direction),
> +		 cmd->se_tfo->get_cmd_state(cmd), cmd_state_name(cmd->t_state),
> +		 cmd->data_length, atomic_read(&cmd->tgt_ref),
> +		 atomic_read(&cmd->cmd_kref.refcount));
> +}
> +

Actually, this should be a standalone patch that is used by
__transport_wait_for_tasks() during CMD_T_STOP and waiting for
se_cmd->t_transport_stop_comp to complete.

I'd be happy to pick that up for existing code.

> @@ -3043,47 +3075,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
>  }
>  EXPORT_SYMBOL(transport_check_aborted_status);
>  
> -void transport_send_task_abort(struct se_cmd *cmd)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -	/*
> -	 * If there are still expected incoming fabric WRITEs, we wait
> -	 * until until they have completed before sending a TASK_ABORTED
> -	 * response.  This response with TASK_ABORTED status will be
> -	 * queued back to fabric module by transport_check_aborted_status().
> -	 */
> -	if (cmd->data_direction == DMA_TO_DEVICE) {
> -		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
> -			spin_lock_irqsave(&cmd->t_state_lock, flags);
> -			if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) {
> -				spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -				goto send_abort;
> -			}
> -			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
> -			spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -			return;
> -		}
> -	}

Getting rid of the ->write_pending_status() here and DELAYED_TAS
completely breaks WRITEs during CMD_T_ABORTED for iscsi-target and
qla2xxx, when the outstanding WRITE transfer _must be_ allowed to
complete before proceeding with abort.

See comments in patch #23 why this can't be removed.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 9, 2017, 7:23 p.m. UTC | #2
On Thu, 2017-02-09 at 02:43 -0800, Nicholas A. Bellinger wrote:
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> > index 161c8b8482db..980d94277c8d 100644
> > --- a/drivers/target/target_core_tmr.c
> > +++ b/drivers/target/target_core_tmr.c
> > @@ -192,13 +173,13 @@ void core_tmr_abort_task(
> >  			continue;
> >  		}
> >  
> > +		se_cmd->send_abort_response = false;
> >  		list_del_init(&se_cmd->se_cmd_list);
> >  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
> >  
> > -		cancel_work_sync(&se_cmd->work);
> > -		transport_wait_for_tasks(se_cmd);
> > +		while (!wait_for_completion_timeout(&se_cmd->complete, 180 * HZ))
> > +			target_show_cmd(__func__, se_cmd);
> >  
> 
> This is still fundamentally wrong, because it allows se_cmd to be
> dispatched into the backend driver, and only just waits for completion
> from the backend after the fact.

I disagree. The CMD_T_ABORTED flag is tested in the command execution path
everywhere CMD_T_STOP is tested so setting CMD_T_ABORTED only is sufficient.
It is not necessary to set both CMD_T_ABORTED and CMD_T_STOP.

> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> > index 533b584c2806..aac1fc1b2a7c 100644
> > --- a/drivers/target/target_core_transport.c
> > +++ b/drivers/target/target_core_transport.c
> > @@ -714,16 +739,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
> >  			success = 1;
> >  	}
> >  
> > -	/*
> > -	 * Check for case where an explicit ABORT_TASK has been received
> > -	 * and transport_wait_for_tasks() will be waiting for completion..
> > -	 */
> > -	if (cmd->transport_state & CMD_T_ABORTED ||
> > -	    cmd->transport_state & CMD_T_STOP) {
> > -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> > -		complete_all(&cmd->t_transport_stop_comp);
> > -		return;
> > -	} else if (!success) {
> 
> This can't be removed because __transport_wait_for_tasks() needs to
> catch the CMD_T_STOP or CMD_T_ABORTED quiesce here.

You have quoted my patch in such a way that it becomes misleading. I have
*not* removed the CMD_T_ABORTED and CMD_T_STOP tests. I have moved these
up. You should have noticed that.

> > @@ -1886,6 +1904,7 @@ void target_execute_cmd(struct se_cmd *cmd)
> >  	spin_lock_irq(&cmd->t_state_lock);
> >  	if (__transport_check_aborted_status(cmd, 1)) {
> >  		spin_unlock_irq(&cmd->t_state_lock);
> > +		transport_handle_abort(cmd);
> >  		return;
> >  	}
> 
> __transport_check_aborted_status() must honor delayed TAS, and allow the
> fabric to complete WRITE transfer before proceeding with abort.

There is no such concept as "delayed TAS" in the SCSI spec. TAS stands for
"task aborted status". If this bit is set then a TASK ABORTED status has to
be sent back to the initiator if the abort was triggered from another I_T
nexus than the one through the SCSI command was received. My patch does that.

> > -	/*
> > -	 * If the task has been internally aborted due to TMR ABORT_TASK
> > -	 * or LUN_RESET, target_core_tmr.c is responsible for performing
> > -	 * the remaining calls to target_put_sess_cmd(), and not the
> > -	 * callers of this function.
> > -	 */
> >  	if (aborted) {
> >  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
> >  		wait_for_completion(&cmd->cmd_wait_comp);
> > -		transport_put_cmd(cmd);
> > -		ret = 1;
> >  	}
> > -	return ret;
> > +	return transport_put_cmd(cmd);
> 
> As mentioned in patch #14, there is no way this can possibly work.
> 
> The wait_for_completion(&cmd->cmd_wait_comp) is woken up when
> transport_put_cmd() -> kref_put() of se_cmd->cmd_kref reaches zero, so
> there is no way the wait_for_completion during CMD_T_ABORTED can ever
> make forward progress.

I will remove cmd_wait_comp.

> Getting rid of the ->write_pending_status() here and DELAYED_TAS
> completely breaks WRITEs during CMD_T_ABORTED for iscsi-target and
> qla2xxx, when the outstanding WRITE transfer _must be_ allowed to
> complete before proceeding with abort.

My patch lets outstanding writes complete before proceeding with an
abort.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index 48de996ad28d..944085bff6af 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -136,7 +136,6 @@  int	init_se_kmem_caches(void);
 void	release_se_kmem_caches(void);
 u32	scsi_get_new_index(scsi_index_t);
 void	transport_subsystem_check_init(void);
-void	transport_cmd_finish_abort(struct se_cmd *, int);
 unsigned char *transport_dump_cmd_direction(struct se_cmd *);
 void	transport_dump_dev_state(struct se_device *, char *, int *);
 void	transport_dump_dev_info(struct se_device *, struct se_lun *,
@@ -147,7 +146,7 @@  int	transport_dump_vpd_ident_type(struct t10_vpd *, unsigned char *, int);
 int	transport_dump_vpd_ident(struct t10_vpd *, unsigned char *, int);
 void	transport_clear_lun_ref(struct se_lun *);
 int	__target_put_sess_cmd(struct se_cmd *se_cmd);
-void	transport_send_task_abort(struct se_cmd *);
+void target_show_cmd(const char *ctxt, struct se_cmd *cmd);
 sense_reason_t	target_cmd_size_check(struct se_cmd *cmd, unsigned int size);
 void	target_qf_do_work(struct work_struct *work);
 bool	target_check_wce(struct se_device *dev);
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 161c8b8482db..980d94277c8d 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -75,25 +75,6 @@  void core_tmr_release_req(struct se_tmr_req *tmr)
 	kfree(tmr);
 }
 
-static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas)
-{
-	unsigned long flags;
-	bool remove = true, send_tas;
-	/*
-	 * TASK ABORTED status (TAS) bit support
-	 */
-	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	send_tas = (cmd->transport_state & CMD_T_TAS);
-	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-
-	if (send_tas) {
-		remove = false;
-		transport_send_task_abort(cmd);
-	}
-
-	transport_cmd_finish_abort(cmd, remove);
-}
-
 static int target_check_cdb_and_preempt(struct list_head *list,
 		struct se_cmd *cmd)
 {
@@ -192,13 +173,13 @@  void core_tmr_abort_task(
 			continue;
 		}
 
+		se_cmd->send_abort_response = false;
 		list_del_init(&se_cmd->se_cmd_list);
 		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
 
-		cancel_work_sync(&se_cmd->work);
-		transport_wait_for_tasks(se_cmd);
+		while (!wait_for_completion_timeout(&se_cmd->complete, 180 * HZ))
+			target_show_cmd(__func__, se_cmd);
 
-		transport_cmd_finish_abort(se_cmd, true);
 		__target_put_sess_cmd(se_cmd);
 
 		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
@@ -295,14 +276,30 @@  static void core_tmr_drain_tmr_list(
 			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
 			tmr_p->function, tmr_p->response, cmd->t_state);
 
-		cancel_work_sync(&cmd->work);
-		transport_wait_for_tasks(cmd);
-
-		transport_cmd_finish_abort(cmd, 1);
+		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
+			target_show_cmd(__func__, cmd);
 		__target_put_sess_cmd(cmd);
 	}
 }
 
+/**
+ * core_tmr_drain_state_list() - abort SCSI commands associated with a device
+ *
+ * @dev:       Device for which to abort outstanding SCSI commands.
+ * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called
+ *             to realize the PREEMPT AND ABORT functionality.
+ * @tmr_sess:  Session through which the LUN RESET has been received.
+ * @tas:       Task Aborted Status (TAS) bit from the SCSI control mode page.
+ *             A quote from SPC-4, paragraph "7.5.10 Control mode page":
+ *             "A task aborted status (TAS) bit set to zero specifies that
+ *             aborted commands shall be terminated by the device server
+ *             without any response to the application client. A TAS bit set
+ *             to one specifies that commands aborted by the actions of an I_T
+ *             nexus other than the I_T nexus on which the command was
+ *             received shall be completed with TASK ABORTED status."
+ * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list
+ *             with registrations that will be preempted.
+ */
 static void core_tmr_drain_state_list(
 	struct se_device *dev,
 	struct se_cmd *prout_cmd,
@@ -311,6 +308,7 @@  static void core_tmr_drain_state_list(
 	struct list_head *preempt_and_abort_list)
 {
 	LIST_HEAD(drain_task_list);
+	struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL;
 	struct se_session *sess;
 	struct se_cmd *cmd, *next;
 	unsigned long flags;
@@ -365,6 +363,8 @@  static void core_tmr_drain_state_list(
 
 		list_move_tail(&cmd->state_list, &drain_task_list);
 		cmd->state_active = false;
+		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
+			cmd->send_abort_response = true;
 	}
 	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
 
@@ -387,17 +387,8 @@  static void core_tmr_drain_state_list(
 			(cmd->transport_state & CMD_T_STOP) != 0,
 			(cmd->transport_state & CMD_T_SENT) != 0);
 
-		/*
-		 * If the command may be queued onto a workqueue cancel it now.
-		 *
-		 * This is equivalent to removal from the execute queue in the
-		 * loop above, but we do it down here given that
-		 * cancel_work_sync may block.
-		 */
-		cancel_work_sync(&cmd->work);
-		transport_wait_for_tasks(cmd);
-
-		core_tmr_handle_tas_abort(cmd, tas);
+		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
+			target_show_cmd(__func__, cmd);
 		__target_put_sess_cmd(cmd);
 	}
 }
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 533b584c2806..aac1fc1b2a7c 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -644,25 +644,6 @@  static void transport_lun_remove_cmd(struct se_cmd *cmd)
 		percpu_ref_put(&lun->lun_ref);
 }
 
-void transport_cmd_finish_abort(struct se_cmd *cmd, int remove)
-{
-	bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF);
-
-	if (cmd->se_cmd_flags & SCF_SE_LUN_CMD)
-		transport_lun_remove_cmd(cmd);
-	/*
-	 * Allow the fabric driver to unmap any resources before
-	 * releasing the descriptor via TFO->release_cmd()
-	 */
-	if (remove)
-		cmd->se_tfo->aborted_task(cmd);
-
-	if (transport_cmd_check_stop_to_fabric(cmd))
-		return;
-	if (remove && ack_kref)
-		transport_put_cmd(cmd);
-}
-
 static void target_complete_failure_work(struct work_struct *work)
 {
 	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
@@ -694,14 +675,58 @@  static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd)
 	return cmd->sense_buffer;
 }
 
+static void transport_handle_abort(struct se_cmd *cmd)
+{
+	bool ack_kref = cmd->se_cmd_flags & SCF_ACK_KREF;
+
+	transport_lun_remove_cmd(cmd);
+
+	if (cmd->send_abort_response) {
+		cmd->scsi_status = SAM_STAT_TASK_ABORTED;
+		pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
+			 cmd->t_task_cdb[0], cmd->tag);
+		trace_target_cmd_complete(cmd);
+		cmd->se_tfo->queue_status(cmd);
+		transport_cmd_check_stop_to_fabric(cmd);
+	} else {
+		/*
+		 * Allow the fabric driver to unmap any resources before
+		 * releasing the descriptor via TFO->release_cmd()
+		 */
+		cmd->se_tfo->aborted_task(cmd);
+		/*
+		 * To do: establish a unit attention condition on the I_T
+		 * nexus associated with cmd. See also the paragraph "Aborting
+		 * commands" in SAM.
+		 */
+		if (transport_cmd_check_stop_to_fabric(cmd) == 0 && ack_kref)
+			transport_put_cmd(cmd);
+	}
+}
+
+static void target_abort_work(struct work_struct *work)
+{
+	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
+
+	transport_handle_abort(cmd);
+}
+
+/* May be called from interrupt context so must not sleep. */
 void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 {
 	struct se_device *dev = cmd->se_dev;
 	int success = scsi_status == GOOD;
 	unsigned long flags;
 
-	cmd->scsi_status = scsi_status;
+	if (cmd->transport_state & CMD_T_ABORTED) {
+		INIT_WORK(&cmd->work, target_abort_work);
+		goto queue_work;
+	} else if (cmd->transport_state & CMD_T_STOP) {
+		complete_all(&cmd->t_transport_stop_comp);
+		return;
+	}
 
+	cmd->scsi_status = scsi_status;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
 	cmd->transport_state &= ~CMD_T_BUSY;
@@ -714,16 +739,7 @@  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 			success = 1;
 	}
 
-	/*
-	 * Check for case where an explicit ABORT_TASK has been received
-	 * and transport_wait_for_tasks() will be waiting for completion..
-	 */
-	if (cmd->transport_state & CMD_T_ABORTED ||
-	    cmd->transport_state & CMD_T_STOP) {
-		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-		complete_all(&cmd->t_transport_stop_comp);
-		return;
-	} else if (!success) {
+	if (!success) {
 		INIT_WORK(&cmd->work, target_complete_failure_work);
 	} else {
 		INIT_WORK(&cmd->work, target_complete_ok_work);
@@ -733,6 +749,7 @@  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
+queue_work:
 	if (cmd->se_cmd_flags & SCF_USE_CPUID)
 		queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
 	else
@@ -1215,6 +1232,7 @@  void transport_init_se_cmd(
 	INIT_LIST_HEAD(&cmd->state_list);
 	init_completion(&cmd->t_transport_stop_comp);
 	init_completion(&cmd->cmd_wait_comp);
+	init_completion(&cmd->complete);
 	spin_lock_init(&cmd->t_state_lock);
 	kref_init(&cmd->cmd_kref);
 	atomic_set(&cmd->tgt_ref, 1);
@@ -1886,6 +1904,7 @@  void target_execute_cmd(struct se_cmd *cmd)
 	spin_lock_irq(&cmd->t_state_lock);
 	if (__transport_check_aborted_status(cmd, 1)) {
 		spin_unlock_irq(&cmd->t_state_lock);
+		transport_handle_abort(cmd);
 		return;
 	}
 	if (cmd->transport_state & CMD_T_STOP) {
@@ -2494,15 +2513,11 @@  static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
 
 int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 {
-	int ret = 0;
 	bool aborted = false, tas = false;
 
 	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
 		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
 			target_wait_free_cmd(cmd, &aborted, &tas);
-
-		if (!aborted || tas)
-			ret = transport_put_cmd(cmd);
 	} else {
 		if (wait_for_tasks)
 			target_wait_free_cmd(cmd, &aborted, &tas);
@@ -2516,23 +2531,12 @@  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 
 		if (cmd->se_lun)
 			transport_lun_remove_cmd(cmd);
-
-		if (!aborted || tas)
-			ret = transport_put_cmd(cmd);
 	}
-	/*
-	 * If the task has been internally aborted due to TMR ABORT_TASK
-	 * or LUN_RESET, target_core_tmr.c is responsible for performing
-	 * the remaining calls to target_put_sess_cmd(), and not the
-	 * callers of this function.
-	 */
 	if (aborted) {
 		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
 		wait_for_completion(&cmd->cmd_wait_comp);
-		transport_put_cmd(cmd);
-		ret = 1;
 	}
-	return ret;
+	return transport_put_cmd(cmd);
 }
 EXPORT_SYMBOL(transport_generic_free_cmd);
 
@@ -2644,6 +2648,43 @@  int target_put_sess_cmd(struct se_cmd *se_cmd)
 }
 EXPORT_SYMBOL(target_put_sess_cmd);
 
+static const char *data_dir_name(enum dma_data_direction d)
+{
+	switch (d) {
+	case DMA_BIDIRECTIONAL:	return "BIDI";
+	case DMA_TO_DEVICE:	return "WRITE";
+	case DMA_FROM_DEVICE:	return "READ";
+	case DMA_NONE:		return "NONE";
+	default:		return "(?)";
+	}
+}
+
+static const char *cmd_state_name(enum transport_state_table t)
+{
+	switch (t) {
+	case TRANSPORT_NO_STATE:	return "NO_STATE";
+	case TRANSPORT_NEW_CMD:		return "NEW_CMD";
+	case TRANSPORT_WRITE_PENDING:	return "WRITE_PENDING";
+	case TRANSPORT_PROCESSING:	return "PROCESSING";
+	case TRANSPORT_COMPLETE:	return "COMPLETE";
+	case TRANSPORT_ISTATE_PROCESSING:
+					return "ISTATE_PROCESSING";
+	case TRANSPORT_COMPLETE_QF_WP:	return "COMPLETE_QF_WP";
+	case TRANSPORT_COMPLETE_QF_OK:	return "COMPLETE_QF_OK";
+	default:			return "?";
+	}
+}
+
+void target_show_cmd(const char *ctxt, struct se_cmd *cmd)
+{
+	pr_debug("%s: still waiting for %s with tag %#llx; dir %s i_state %d t_tate %s len %d tgt_ref %d refcnt %d\n",
+		 ctxt, cmd->se_cmd_flags & SCF_SCSI_TMR_CDB ? "tmf" : "cmd",
+		 cmd->tag, data_dir_name(cmd->data_direction),
+		 cmd->se_tfo->get_cmd_state(cmd), cmd_state_name(cmd->t_state),
+		 cmd->data_length, atomic_read(&cmd->tgt_ref),
+		 atomic_read(&cmd->cmd_kref.refcount));
+}
+
 /* target_sess_cmd_list_set_waiting - Flag all commands in
  *         sess_cmd_list to complete cmd_wait_comp.  Set
  *         sess_tearing_down so no more commands are queued.
@@ -3017,16 +3058,7 @@  static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 		return 1;
 	}
 
-	pr_debug("Sending delayed SAM_STAT_TASK_ABORTED status for CDB:"
-		" 0x%02x ITT: 0x%08llx\n", cmd->t_task_cdb[0], cmd->tag);
-
 	cmd->se_cmd_flags &= ~SCF_SEND_DELAYED_TAS;
-	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
-	trace_target_cmd_complete(cmd);
-
-	spin_unlock_irq(&cmd->t_state_lock);
-	cmd->se_tfo->queue_status(cmd);
-	spin_lock_irq(&cmd->t_state_lock);
 
 	return 1;
 }
@@ -3043,47 +3075,6 @@  int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 }
 EXPORT_SYMBOL(transport_check_aborted_status);
 
-void transport_send_task_abort(struct se_cmd *cmd)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
-		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-		return;
-	}
-	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-
-	/*
-	 * If there are still expected incoming fabric WRITEs, we wait
-	 * until until they have completed before sending a TASK_ABORTED
-	 * response.  This response with TASK_ABORTED status will be
-	 * queued back to fabric module by transport_check_aborted_status().
-	 */
-	if (cmd->data_direction == DMA_TO_DEVICE) {
-		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
-			spin_lock_irqsave(&cmd->t_state_lock, flags);
-			if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) {
-				spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-				goto send_abort;
-			}
-			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
-			spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-			return;
-		}
-	}
-send_abort:
-	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
-
-	transport_lun_remove_cmd(cmd);
-
-	pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
-		 cmd->t_task_cdb[0], cmd->tag);
-
-	trace_target_cmd_complete(cmd);
-	cmd->se_tfo->queue_status(cmd);
-}
-
 static void target_tmr_work(struct work_struct *work)
 {
 	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index c84e5eecb94d..e5e0fd852ff5 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -444,6 +444,7 @@  struct se_cmd {
 	unsigned		cmd_wait_set:1;
 	unsigned		unknown_data_length:1;
 	bool			state_active:1;
+	bool			send_abort_response:1;
 	u64			tag; /* SAM command identifier aka task tag */
 	/* Delay for ALUA Active/NonOptimized state access in milliseconds */
 	int			alua_nonop_delay;