diff mbox

[v2,19/36] target: Make ABORT and LUN RESET handling synchronous

Message ID 20170202005853.23456-20-bart.vanassche@sandisk.com (mailing list archive)
State Superseded
Headers show

Commit Message

Bart Van Assche Feb. 2, 2017, 12:58 a.m. UTC
Instead of invoking target driver callback functions from the
context that handles an abort or LUN RESET task management function,
only set the abort flag from that context and perform the actual
abort handling from the context of the regular command processing
flow. This approach has the following advantages:
- The task management function code becomes much easier to read and
  to verify since the number of potential race conditions against
  the command processing flow is strongly reduced.
- It is no longer needed to store the command state into the command
  itself since that information is no longer needed from the context
  where a task management function is processed.

Notes:
- With this patch applied the CMD_T_ABORTED flag is checked at two points
  by the target core: just before local execution of a command starts
  (see also target_execute_cmd()) and also just before the response is
  sent (see also target_complete_cmd()).
- Two .release_cmd() calls have been changed into transport_put_cmd()
  calls to ensure that the 'finished' completion is set before
  .release_cmd() is called.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
---
 drivers/target/target_core_internal.h  |   2 -
 drivers/target/target_core_tmr.c       |  73 ++++++++--------
 drivers/target/target_core_transport.c | 153 +++++++++++++--------------------
 include/target/target_core_base.h      |   2 +
 4 files changed, 95 insertions(+), 135 deletions(-)

Comments

Nicholas A. Bellinger Feb. 7, 2017, 3:40 p.m. UTC | #1
On Wed, 2017-02-01 at 16:58 -0800, Bart Van Assche wrote:
> Instead of invoking target driver callback functions from the
> context that handles an abort or LUN RESET task management function,
> only set the abort flag from that context and perform the actual
> abort handling from the context of the regular command processing
> flow. This approach has the following advantages:
> - The task management function code becomes much easier to read and
>   to verify since the number of potential race conditions against
>   the command processing flow is strongly reduced.
> - It is no longer needed to store the command state into the command
>   itself since that information is no longer needed from the context
>   where a task management function is processed.
> 
> Notes:
> - With this patch applied the CMD_T_ABORTED flag is checked at two points
>   by the target core: just before local execution of a command starts
>   (see also target_execute_cmd()) and also just before the response is
>   sent (see also target_complete_cmd()).
> - Two .release_cmd() calls have been changed into transport_put_cmd()
>   calls to ensure that the 'finished' completion is set before
>   .release_cmd() is called.
> 

And here is meat of the series.

Before diving into the specific issues with the patch, I'd like to give
some context wrt to how the existing code works today in v4.10, and
highlight the first order and second order issues involved.

First order issue is target_core_tmr.c code ABORT_TASK and LUN_RESET
obtains a se_cmd->cmd_kref, and blocks on transport_wait_for_tasks() ->
wait_for_completion(&se_cmd->t_transport_stop_comp) waiting to quiesce
an outstanding in-flight se_cmd descriptor.

This quiesce of in-flight se_cmd can happen in three ways:

  - Early submission in target_execute_cmd() to catch CMD_T_STOP
    before se_cmd is submitted to the backend, or
  - waiting for outstanding backend I/O to complete via
    target_complete_cmd() to catch CMD_T_STOP (typical case when a 
    backend device is no responding), or
  - waiting for transport_cmd_check_stop() hand-off of se_cmd from
    target-core to fabric driver response to catch CMD_T_STOP.

As you know in target_core_tmr.c code, this quiesce can happen for
se_cmd with ABORT_TASK in core_tmr_abort_task(), with LUN_RESET it
happen for TMRs in core_tmr_drain_tmr_list() and se_cmd descriptors in
core_tmr_drain_state_list() as part of se_device->state_list.

Once CMD_T_STOP is cleared, target_core_tmr.c will drop both the
outstanding references to se_cmd->cmd_kref, as well as it's own
reference obtained by __target_check_io_state().

The second order issue, which is what complicates things with the
existing code, is that any of the above first order cases can occur and
block waiting for CMD_T_STOP to be cleared at the _same time_ while the
associated fabric driver se_session is being shutdown, and sets
CMD_T_FABRIC_STOP on all the se_session's se_cmd descriptors.

The session shutdown can happen explicitly by the fabric driver, or by a
session reinstatement event (with iscsi-target for example) when a host
doesn't get a TMR response due to backend I/O not being completed for an
extended amount of time.

The dynamic interaction of the first and second order issues here is
what has taken us a long time to get right in upstream code, and which
has been stable for end-users the past year since the advent of commit
0f4a94316.

So with that additional context about how/why things work as they do
today in mainline code, my comments are inline below.  As mentioned
earlier, please respond in-line to the specific comments.

> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Grover <agrover@redhat.com>
> Cc: David Disseldorp <ddiss@suse.de>
> ---
>  drivers/target/target_core_internal.h  |   2 -
>  drivers/target/target_core_tmr.c       |  73 ++++++++--------
>  drivers/target/target_core_transport.c | 153 +++++++++++++--------------------
>  include/target/target_core_base.h      |   2 +
>  4 files changed, 95 insertions(+), 135 deletions(-)
> 
> diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
> index 9ab7090f7c83..6f6cae81ab8e 100644
> --- a/drivers/target/target_core_internal.h
> +++ b/drivers/target/target_core_internal.h
> @@ -136,7 +136,6 @@ int	init_se_kmem_caches(void);
>  void	release_se_kmem_caches(void);
>  u32	scsi_get_new_index(scsi_index_t);
>  void	transport_subsystem_check_init(void);
> -void	transport_cmd_finish_abort(struct se_cmd *, int);
>  unsigned char *transport_dump_cmd_direction(struct se_cmd *);
>  void	transport_dump_dev_state(struct se_device *, char *, int *);
>  void	transport_dump_dev_info(struct se_device *, struct se_lun *,
> @@ -146,7 +145,6 @@ int	transport_dump_vpd_assoc(struct t10_vpd *, unsigned char *, int);
>  int	transport_dump_vpd_ident_type(struct t10_vpd *, unsigned char *, int);
>  int	transport_dump_vpd_ident(struct t10_vpd *, unsigned char *, int);
>  void	transport_clear_lun_ref(struct se_lun *);
> -void	transport_send_task_abort(struct se_cmd *);
>  sense_reason_t	target_cmd_size_check(struct se_cmd *cmd, unsigned int size);
>  void	target_qf_do_work(struct work_struct *work);
>  bool	target_check_wce(struct se_device *dev);
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 367799b4dde1..bc07e059fb22 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -75,25 +75,6 @@ void core_tmr_release_req(struct se_tmr_req *tmr)
>  	kfree(tmr);
>  }
>  
> -static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas)
> -{
> -	unsigned long flags;
> -	bool remove = true, send_tas;
> -	/*
> -	 * TASK ABORTED status (TAS) bit support
> -	 */
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	send_tas = (cmd->transport_state & CMD_T_TAS);
> -	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -	if (send_tas) {
> -		remove = false;
> -		transport_send_task_abort(cmd);
> -	}
> -
> -	transport_cmd_finish_abort(cmd, remove);
> -}
> -
>  static int target_check_cdb_and_preempt(struct list_head *list,
>  		struct se_cmd *cmd)
>  {
> @@ -192,13 +173,13 @@ void core_tmr_abort_task(
>  			continue;
>  		}
>  
> -		list_del_init(&se_cmd->se_cmd_list);
> +		se_cmd->send_abort_response = false;
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  
> -		cancel_work_sync(&se_cmd->work);
> -		transport_wait_for_tasks(se_cmd);
> +		while (!wait_for_completion_timeout(&se_cmd->finished, 180 * HZ))
> +			pr_debug("ABORT TASK: still waiting for ITT %#llx\n",
> +				 ref_tag);
>  
> -		transport_cmd_finish_abort(se_cmd, true);
>  		target_put_sess_cmd(se_cmd);
>  

So instead of using the common transport_wait_for_tasks() across all of
the quiesce cases, this introduces a new se_cmd->finished completion to
seperate out normal CMD_T_STOP shutdown vs. TMR ABORT_TASK.

Not a bad idea.  However, there is a basic fatal flaw in this approach
with this patch as-is that results in se_cmd->finished never being able
to complete for any se_cmd with CMD_T_ABORTED .

For the above, core_tmr_abort_task() calls __target_check_io_state() to
obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and
sets CMD_T_ABORTED.  Then, wait_for_completion_timeout() sits in a while
looping waiting for se_cmd->finished to be completed.

However, the complete_all(&se_cmd->finished) you've added below is only
invoked from the final se_cmd->cmd_kref callback in
target_release_cmd_kref().

Which means core_tmr_abort_task() will always be stuck indefinitely on
se_cmd->finished above, because the subsequent target_put_sess_cmd()
after the wait_for_completion_timeout() is the caller responsible for
doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref)
to perform complete_all(&se_cmd->finished).

>  		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
> @@ -295,14 +276,31 @@ static void core_tmr_drain_tmr_list(
>  			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
>  			tmr_p->function, tmr_p->response, cmd->t_state);
>  
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		transport_cmd_finish_abort(cmd, 1);
> +		while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ))
> +			pr_debug("LUN RESET: still waiting for ITT %#llx\n",
> +				 cmd->tag);
>  		target_put_sess_cmd(cmd);
>  	}
>  }
>  

Same thing here for TMR abort.

There is no way for target_release_cmd_kref() to ever invoke
complete_all(&se_cmd->finished) to make forward progress beyond the
wait_for_completion_timeout() here until the subsequent
target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero.

> +/**
> + * core_tmr_drain_state_list() - Abort SCSI commands associated with a device.
> + *
> + * @dev:       Device for which to abort outstanding SCSI commands.
> + * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called
> + *             to realize the PREEMPT AND ABORT functionality.
> + * @tmr_sess:  Session through which the LUN RESET has been received.
> + * @tas:       Task Aborted Status (TAS) bit from the SCSI control mode page.
> + *             A quote from SPC-4, paragraph "7.5.10 Control mode page":
> + *             "A task aborted status (TAS) bit set to zero specifies that
> + *             aborted commands shall be terminated by the device server
> + *             without any response to the application client. A TAS bit set
> + *             to one specifies that commands aborted by the actions of an I_T
> + *             nexus other than the I_T nexus on which the command was
> + *             received shall be completed with TASK ABORTED status."
> + * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list
> + *             with registrations that will be preempted.
> + */
>  static void core_tmr_drain_state_list(
>  	struct se_device *dev,
>  	struct se_cmd *prout_cmd,
> @@ -311,6 +309,7 @@ static void core_tmr_drain_state_list(
>  	struct list_head *preempt_and_abort_list)
>  {
>  	LIST_HEAD(drain_task_list);
> +	struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL;
>  	struct se_session *sess;
>  	struct se_cmd *cmd, *next;
>  	unsigned long flags;
> @@ -340,6 +339,10 @@ static void core_tmr_drain_state_list(
>  	 */
>  	spin_lock_irqsave(&dev->execute_task_lock, flags);
>  	list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) {
> +		/* Skip task management functions. */
> +		if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
> +			continue;
> +

TMRs are never added to se_device->state_list, so this extra check is
unnecessary.

>  		/*
>  		 * For PREEMPT_AND_ABORT usage, only process commands
>  		 * with a matching reservation key.
> @@ -365,6 +368,8 @@ static void core_tmr_drain_state_list(
>  
>  		list_move_tail(&cmd->state_list, &drain_task_list);
>  		cmd->state_active = false;
> +		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
> +			cmd->send_abort_response = true;
>  	}
>  	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
>  
> @@ -387,17 +392,9 @@ static void core_tmr_drain_state_list(
>  			(cmd->transport_state & CMD_T_STOP) != 0,
>  			(cmd->transport_state & CMD_T_SENT) != 0);
>  
> -		/*
> -		 * If the command may be queued onto a workqueue cancel it now.
> -		 *
> -		 * This is equivalent to removal from the execute queue in the
> -		 * loop above, but we do it down here given that
> -		 * cancel_work_sync may block.
> -		 */
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		core_tmr_handle_tas_abort(cmd, tas);
> +		while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ))
> +			pr_debug("LUN RESET: still waiting for cmd with ITT %#llx\n",
> +				 cmd->tag);
>  		target_put_sess_cmd(cmd);
>  	}
>  }

Same bug as above for aborting se_cmd from se_device->state_list.

There is no way for target_release_cmd_kref() to ever invoke
complete_all(&se_cmd->finished) to make forward progress beyond the
wait_for_completion_timeout() here until the subsequent
target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero.

> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 40dff4799984..8506ecc0d6c0 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -650,25 +650,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
>  		percpu_ref_put(&lun->lun_ref);
>  }
>  
> -void transport_cmd_finish_abort(struct se_cmd *cmd, int remove)
> -{
> -	bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF);
> -
> -	if (cmd->se_cmd_flags & SCF_SE_LUN_CMD)
> -		transport_lun_remove_cmd(cmd);
> -	/*
> -	 * Allow the fabric driver to unmap any resources before
> -	 * releasing the descriptor via TFO->release_cmd()
> -	 */
> -	if (remove)
> -		cmd->se_tfo->aborted_task(cmd);
> -
> -	if (transport_cmd_check_stop_to_fabric(cmd))
> -		return;
> -	if (remove && ack_kref)
> -		transport_put_cmd(cmd);
> -}
> -
>  static void target_complete_failure_work(struct work_struct *work)
>  {
>  	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> @@ -700,14 +681,58 @@ static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd)
>  	return cmd->sense_buffer;
>  }
>  
> +static void transport_handle_abort(struct se_cmd *cmd)
> +{
> +	bool ack_kref = cmd->se_cmd_flags & SCF_ACK_KREF;
> +
> +	transport_lun_remove_cmd(cmd);
> +
> +	if (cmd->send_abort_response) {
> +		cmd->scsi_status = SAM_STAT_TASK_ABORTED;
> +		pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
> +			 cmd->t_task_cdb[0], cmd->tag);
> +		trace_target_cmd_complete(cmd);
> +		cmd->se_tfo->queue_status(cmd);
> +		transport_cmd_check_stop_to_fabric(cmd);
> +	} else {
> +		/*
> +		 * Allow the fabric driver to unmap any resources before
> +		 * releasing the descriptor via TFO->release_cmd()
> +		 */
> +		cmd->se_tfo->aborted_task(cmd);
> +		/*
> +		 * To do: establish a unit attention condition on the I_T
> +		 * nexus associated with cmd. See also the paragraph "Aborting
> +		 * commands" in SAM.
> +		 */
> +		if (transport_cmd_check_stop_to_fabric(cmd) == 0 && ack_kref)
> +			transport_put_cmd(cmd);
> +	}
> +}
> +
> +static void target_abort_work(struct work_struct *work)
> +{
> +	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> +
> +	transport_handle_abort(cmd);
> +}
> +
> +/* May be called from interrupt context so must not sleep. */
>  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  {
>  	struct se_device *dev = cmd->se_dev;
>  	int success = scsi_status == GOOD;
>  	unsigned long flags;
>  
> -	cmd->scsi_status = scsi_status;
> +	if (cmd->transport_state & CMD_T_ABORTED) {
> +		INIT_WORK(&cmd->work, target_abort_work);
> +		goto queue_work;
> +	} else if (cmd->transport_state & CMD_T_STOP) {
> +		complete_all(&cmd->t_transport_stop_comp);
> +		return;
> +	}
>  
> +	cmd->scsi_status = scsi_status;
>  
>  	spin_lock_irqsave(&cmd->t_state_lock, flags);
>  	cmd->transport_state &= ~CMD_T_BUSY;
> @@ -720,16 +745,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  			success = 1;
>  	}
>  
> -	/*
> -	 * Check for case where an explicit ABORT_TASK has been received
> -	 * and transport_wait_for_tasks() will be waiting for completion..
> -	 */
> -	if (cmd->transport_state & CMD_T_ABORTED ||
> -	    cmd->transport_state & CMD_T_STOP) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		complete_all(&cmd->t_transport_stop_comp);
> -		return;
> -	} else if (!success) {
> +	if (!success) {
>  		INIT_WORK(&cmd->work, target_complete_failure_work);
>  	} else {
>  		INIT_WORK(&cmd->work, target_complete_ok_work);
> @@ -739,6 +755,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
>  	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>  
> +queue_work:
>  	if (cmd->se_cmd_flags & SCF_USE_CPUID)
>  		queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
>  	else
> @@ -1221,6 +1238,7 @@ void transport_init_se_cmd(
>  	INIT_LIST_HEAD(&cmd->state_list);
>  	init_completion(&cmd->t_transport_stop_comp);
>  	init_completion(&cmd->cmd_wait_comp);
> +	init_completion(&cmd->finished);
>  	spin_lock_init(&cmd->t_state_lock);
>  	kref_init(&cmd->cmd_kref);
>  	cmd->transport_state = CMD_T_DEV_ACTIVE;
> @@ -1883,16 +1901,18 @@ static int __transport_check_aborted_status(struct se_cmd *, int);
>  void target_execute_cmd(struct se_cmd *cmd)
>  {
>  	/*
> +	 * If the received CDB has aleady been aborted stop processing it here.
> +	 */
> +	if (transport_check_aborted_status(cmd, 1) != 0)
> +		return;
> +
> +	/*
>  	 * Determine if frontend context caller is requesting the stopping of
>  	 * this command for frontend exceptions.
>  	 *
>  	 * If the received CDB has aleady been aborted stop processing it here.
>  	 */
>  	spin_lock_irq(&cmd->t_state_lock);
> -	if (__transport_check_aborted_status(cmd, 1)) {
> -		spin_unlock_irq(&cmd->t_state_lock);
> -		return;
> -	}
>  	if (cmd->transport_state & CMD_T_STOP) {
>  		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
>  			__func__, __LINE__, cmd->tag);

The use of transport_check_aborted_status(), instead of using
__transport_check_aborted_status() means se_cmd->t_state_lock is taken
here twice for every I/O.

There is no reason to take this lock twice, so to avoid additional
fast-path overhead this part should be avoided.

> @@ -2499,15 +2519,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
>  
>  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  {
> -	int ret = 0;
>  	bool aborted = false, tas = false;
>  
>  	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
>  		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	} else {
>  		if (wait_for_tasks)
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> @@ -2521,9 +2537,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  
>  		if (cmd->se_lun)
>  			transport_lun_remove_cmd(cmd);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	}
>  	/*
>  	 * If the task has been internally aborted due to TMR ABORT_TASK
> @@ -2534,10 +2547,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
>  		wait_for_completion(&cmd->cmd_wait_comp);
> -		cmd->se_tfo->release_cmd(cmd);
> -		ret = 1;
>  	}
> -	return ret;
> +	return transport_put_cmd(cmd);
>  }

This code snippet is specific to the second order issue mentioned above
involving concurrent session shutdown (CMD_T_FABRIC_STOP) and
CMD_T_ABORTED.

transport_generic_free_cmd() is used by iscsi-target + iser-target
during session shutdown, and for the second order case existing code
expects I/O to be quiesced and blocked on ->cmd_wait_comp waiting for
se_cmd->cmd_kref to drop to zero, before invoking se_tfo->release_cmd()
directly.

Specifically, existing iscsi-target + iser-target code expects when
transport_generic_free_cmd() returns the command must no longer be in
flight, and for the CMD_T_ABORTED + CMD_T_FABRIC_STOP case, se_cmd
descriptor memory must be released directly.

>  EXPORT_SYMBOL(transport_generic_free_cmd);
>  
> @@ -2596,6 +2607,8 @@ static void target_release_cmd_kref(struct kref *kref)
>  	unsigned long flags;
>  	bool fabric_stop;
>  
> +	complete_all(&se_cmd->finished);
> +

As mentioned in the three cases above for CMD_T_ABORTED, this is the
fatal flaw of the patch.

When CMD_T_ABORTED descriptors take a se_cmd->cmd_kref and then block on
se_cmd->finished, there is no way for them to make forward progress
because this complete_all() is only ever called with se_cmd->cmd_kref
reaches zero.

So as-is, any se_cmd descriptor that reaches CMD_T_ABORTED lgic will
always result in a hung tasks no matter what.

Btw, I did check the final outcome of this series just to be sure it
wasn't fixed in subsequent patches, and the same flaw exists.

>  	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
>  
>  	spin_lock(&se_cmd->t_state_lock);
> @@ -2697,7 +2710,7 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  			" fabric state: %d\n", se_cmd, se_cmd->t_state,
>  			se_cmd->se_tfo->get_cmd_state(se_cmd));
>  
> -		se_cmd->se_tfo->release_cmd(se_cmd);
> +		target_put_sess_cmd(se_cmd);
>  	}
>  

This looks like a similar issue forward progress issue here in
target_wait_for_sess_cmd().

There is a wait_for_completion(&se_cmd->cmd_wait_comp) right above the
original se_tfo->release_cmd(), so this change to do
target_put_sess_cmd() instead means se_cmd->cmd_wait_comp will be
blocked forever waiting for se_cmd->cmd_kref to reach zero via
target_put_sess_cmd().

>  	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
> @@ -3009,16 +3022,7 @@ static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status)
>  		return 1;
>  	}
>  
> -	pr_debug("Sending delayed SAM_STAT_TASK_ABORTED status for CDB:"
> -		" 0x%02x ITT: 0x%08llx\n", cmd->t_task_cdb[0], cmd->tag);
> -
>  	cmd->se_cmd_flags &= ~SCF_SEND_DELAYED_TAS;
> -	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
> -	trace_target_cmd_complete(cmd);
> -
> -	spin_unlock_irq(&cmd->t_state_lock);
> -	cmd->se_tfo->queue_status(cmd);
> -	spin_lock_irq(&cmd->t_state_lock);
>  
>  	return 1;
>  }
> @@ -3035,47 +3039,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
>  }
>  EXPORT_SYMBOL(transport_check_aborted_status);
>  
> -void transport_send_task_abort(struct se_cmd *cmd)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -	/*
> -	 * If there are still expected incoming fabric WRITEs, we wait
> -	 * until until they have completed before sending a TASK_ABORTED
> -	 * response.  This response with TASK_ABORTED status will be
> -	 * queued back to fabric module by transport_check_aborted_status().
> -	 */
> -	if (cmd->data_direction == DMA_TO_DEVICE) {
> -		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
> -			spin_lock_irqsave(&cmd->t_state_lock, flags);
> -			if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) {
> -				spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -				goto send_abort;
> -			}
> -			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
> -			spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -			return;
> -		}
> -	}

Not 100% sure, but I think this also breaks iscsi-target + iser-target.

The fabric driver needs to allow WRITE transfers to complete once
CMD_T_ABORTED is set, but then prevent the submission into the backend.

AFAICT the removal of this means the se_cmd is aborted right away, and
iscsi-target + iser-target aren't allowed to complete their WRITE
payloads.

> -send_abort:
> -	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
> -
> -	transport_lun_remove_cmd(cmd);
> -
> -	pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
> -		 cmd->t_task_cdb[0], cmd->tag);
> -
> -	trace_target_cmd_complete(cmd);
> -	cmd->se_tfo->queue_status(cmd);
> -}
> -
>  static void target_tmr_work(struct work_struct *work)
>  {
>  	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index fb87f0e5d0d6..e690ce36e592 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -444,6 +444,7 @@ struct se_cmd {
>  	unsigned		cmd_wait_set:1;
>  	unsigned		unknown_data_length:1;
>  	bool			state_active:1;
> +	bool			send_abort_response:1;
>  	u64			tag; /* SAM command identifier aka task tag */
>  	/* Delay for ALUA Active/NonOptimized state access in milliseconds */
>  	int			alua_nonop_delay;
> @@ -496,6 +497,7 @@ struct se_cmd {
>  	spinlock_t		t_state_lock;
>  	struct kref		cmd_kref;
>  	struct completion	t_transport_stop_comp;
> +	struct completion	finished;
>  
>  	struct work_struct	work;
>  


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 7, 2017, 10:27 p.m. UTC | #2
On Tue, 2017-02-07 at 07:40 -0800, Nicholas A. Bellinger wrote:
> However, there is a basic fatal flaw in this approach
> with this patch as-is that results in se_cmd->finished never being able
> to complete for any se_cmd with CMD_T_ABORTED .
> 
> For the above, core_tmr_abort_task() calls __target_check_io_state() to
> obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and
> sets CMD_T_ABORTED.  Then, wait_for_completion_timeout() sits in a while
> looping waiting for se_cmd->finished to be completed.
> 
> However, the complete_all(&se_cmd->finished) you've added below is only
> invoked from the final se_cmd->cmd_kref callback in
> target_release_cmd_kref().
> 
> Which means core_tmr_abort_task() will always be stuck indefinitely on
> se_cmd->finished above, because the subsequent target_put_sess_cmd()
> after the wait_for_completion_timeout() is the caller responsible for
> doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref)
> to perform complete_all(&se_cmd->finished).

A fix for this is in test. I will post a new patch as soon as my tests have
finished.

> >  		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
> > @@ -295,14 +276,31 @@ static void core_tmr_drain_tmr_list(
> >  			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
> >  			tmr_p->function, tmr_p->response, cmd->t_state);
> >  
> > -		cancel_work_sync(&cmd->work);
> > -		transport_wait_for_tasks(cmd);
> > -
> > -		transport_cmd_finish_abort(cmd, 1);
> > +		while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ))
> > +			pr_debug("LUN RESET: still waiting for ITT %#llx\n",
> > +				 cmd->tag);
> >  		target_put_sess_cmd(cmd);
> >  	}
> >  }
> >  
> 
> Same thing here for TMR abort.
> 
> There is no way for target_release_cmd_kref() to ever invoke
> complete_all(&se_cmd->finished) to make forward progress beyond the
> wait_for_completion_timeout() here until the subsequent
> target_put_sess_cmd() is called to drop se_cmd->cmd_kref to zero.
> 
> > @@ -340,6 +339,10 @@ static void core_tmr_drain_state_list(
> >  	 */
> >  	spin_lock_irqsave(&dev->execute_task_lock, flags);
> >  	list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) {
> > +		/* Skip task management functions. */
> > +		if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
> > +			continue;
> > +
> 
> TMRs are never added to se_device->state_list, so this extra check is
> unnecessary.

OK, I will leave this out.

> >  void target_execute_cmd(struct se_cmd *cmd)
> >  {
> >  	/*
> > +	 * If the received CDB has aleady been aborted stop processing it here.
> > +	 */
> > +	if (transport_check_aborted_status(cmd, 1) != 0)
> > +		return;
> > +
> > +	/*
> >  	 * Determine if frontend context caller is requesting the stopping of
> >  	 * this command for frontend exceptions.
> >  	 *
> >  	 * If the received CDB has aleady been aborted stop processing it here.
> >  	 */
> >  	spin_lock_irq(&cmd->t_state_lock);
> > -	if (__transport_check_aborted_status(cmd, 1)) {
> > -		spin_unlock_irq(&cmd->t_state_lock);
> > -		return;
> > -	}
> >  	if (cmd->transport_state & CMD_T_STOP) {
> >  		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
> >  			__func__, __LINE__, cmd->tag);
> 
> The use of transport_check_aborted_status(), instead of using
> __transport_check_aborted_status() means se_cmd->t_state_lock is taken
> here twice for every I/O.
> 
> There is no reason to take this lock twice, so to avoid additional
> fast-path overhead this part should be avoided.

A later patch makes transport_check_aborted_status() lockless.

> > @@ -2534,10 +2547,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
> >  	if (aborted) {
> >  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
> >  		wait_for_completion(&cmd->cmd_wait_comp);
> > -		cmd->se_tfo->release_cmd(cmd);
> > -		ret = 1;
> >  	}
> > -	return ret;
> > +	return transport_put_cmd(cmd);
> >  }
> 
> This code snippet is specific to the second order issue mentioned above
> involving concurrent session shutdown (CMD_T_FABRIC_STOP) and
> CMD_T_ABORTED.
> 
> transport_generic_free_cmd() is used by iscsi-target + iser-target
> during session shutdown, and for the second order case existing code
> expects I/O to be quiesced and blocked on ->cmd_wait_comp waiting for
> se_cmd->cmd_kref to drop to zero, before invoking se_tfo->release_cmd()
> directly.
> 
> Specifically, existing iscsi-target + iser-target code expects when
> transport_generic_free_cmd() returns the command must no longer be in
> flight, and for the CMD_T_ABORTED + CMD_T_FABRIC_STOP case, se_cmd
> descriptor memory must be released directly.

I will make sure that all target driver work has completed before
transport_generic_free_cmd() returns.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 8, 2017, 3:11 a.m. UTC | #3
On Tue, 2017-02-07 at 22:27 +0000, Bart Van Assche wrote:
> On Tue, 2017-02-07 at 07:40 -0800, Nicholas A. Bellinger wrote:
> > However, there is a basic fatal flaw in this approach
> > with this patch as-is that results in se_cmd->finished never being able
> > to complete for any se_cmd with CMD_T_ABORTED .
> > 
> > For the above, core_tmr_abort_task() calls __target_check_io_state() to
> > obtain a se_cmd->cmd_kref if the descriptor is able to be aborted and
> > sets CMD_T_ABORTED.  Then, wait_for_completion_timeout() sits in a while
> > looping waiting for se_cmd->finished to be completed.
> > 
> > However, the complete_all(&se_cmd->finished) you've added below is only
> > invoked from the final se_cmd->cmd_kref callback in
> > target_release_cmd_kref().
> > 
> > Which means core_tmr_abort_task() will always be stuck indefinitely on
> > se_cmd->finished above, because the subsequent target_put_sess_cmd()
> > after the wait_for_completion_timeout() is the caller responsible for
> > doing the final kref_put(&se_cmd->cmd_kref, target_release_cmd_kref)
> > to perform complete_all(&se_cmd->finished).
> 
> A fix for this is in test. I will post a new patch as soon as my tests have
> finished.

Can you elaborate a bit on how exactly you've been testing these changes
for the first order (handle ABORT_TASKs and LUN_RESET with outstanding
backend I/O) and second order (handle session shutdown while order one
is active) issues mentioned earlier..?

My main concern is that your tests didn't catch the basic first order
issue from the earlier review, or at least they didn't check for hung
tasks or proper target shutdown after the test completed

The nightly automated runs that we (Datera) for v4.1.y and v3.14.y
production include the following:

Provision 500 target endpoints + luns per node in a 2500 volume cluster.
Run a mixed 4k random workload, and drive up the backend I/O latency to
the point where open-iscsi starts to send ABORT_TASK (15 second
default), LUN_RESET and target reset / session reinstatement (30 second
default).

This typically generates 1000's of ABORT_TASKs and session resets, and
while that is going on we also perform explicit target migration via
configfs (eg: third order issue), that performs target configfs shutdown
while the first and second order issues are occurring.

ESX is also a really good test for TMRs, because it generates ABORT_TASK
after 5 seconds of backend I/O latency, as is very aggressive in
attempting to bring the device back online via repeated TMRs.

So I don't think you need 100s of LUNs to verify these changes, but I'd
at least like you to verify with a few dozen target + LUNs with a mixed
random small block workload with a sufficiently deep iodepth (say 16),
and ensure at least a few hundred ABORT_TASK, LUN_RESET and session
shutdowns are really occurring during your test.

Once the test is completed, perform an explicit fabric logout and target
shutdown.  This is to ensure there are no reference leaks, kernel hung
tasks in un-interruptible sleep, or other memory leaks that have been
introduced by the series.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 8, 2017, 6:16 p.m. UTC | #4
On Tue, 2017-02-07 at 19:11 -0800, Nicholas A. Bellinger wrote:
> Can you elaborate a bit on how exactly you've been testing these changes
> for the first order (handle ABORT_TASKs and LUN_RESET with outstanding
> backend I/O) and second order (handle session shutdown while order one
> is active) issues mentioned earlier..?

Previous tests were run against a fileio backend. That's probably why my
tests passed despite the circular wait in the TMF code (that has already
been solved BTW). Anyway, the tests I run to verify the TMF code are:
1. The libiscsi unit tests.
2. Running fio with a high queue depth and data verification enabled in
   one shell and the following code in a second shell:

while true; do sg_reset -d /dev/sd...; sleep .1; echo -n .; done

> Once the test is completed, perform an explicit fabric logout and target
> shutdown.  This is to ensure there are no reference leaks, kernel hung
> tasks in un-interruptible sleep, or other memory leaks that have been
> introduced by the series.

I run all my tests against a kernel with kmemleak enabled. That's a very
effective way for detecting memory leaks.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 8, 2017, 7:06 p.m. UTC | #5
On Wed, 2017-02-08 at 18:16 +0000, Bart Van Assche wrote:
> On Tue, 2017-02-07 at 19:11 -0800, Nicholas A. Bellinger wrote:
> > Can you elaborate a bit on how exactly you've been testing these changes
> > for the first order (handle ABORT_TASKs and LUN_RESET with outstanding
> > backend I/O) and second order (handle session shutdown while order one
> > is active) issues mentioned earlier..?
> 
> Previous tests were run against a fileio backend.

That would explain why the basic first order functionality didn't work,
because the code-paths that where changed was never actually tested.

>  That's probably why my
> tests passed despite the circular wait in the TMF code (that has already
> been solved BTW).

Yeah, already looked at your change.  It's a hack.

Adding a second atomic_t emulating a kref on top of the existing
se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
path in the normal fast-path is not going to be acceptable.

Like I said, I'm all for improvements in the TMR area but you'll need to
be alot more methodical about how your proposing the changes, and how
your QA team (manual or automation) are verifying the changes.

>  Anyway, the tests I run to verify the TMF code are:
> 1. The libiscsi unit tests.
> 2. Running fio with a high queue depth and data verification enabled in
>    one shell and the following code in a second shell:
> 
> while true; do sg_reset -d /dev/sd...; sleep .1; echo -n .; done
> 

How many individual occurrences of the first order issue (ABORT_TASK +
LUN_RESET), second order issue (session resinstatement while first order
is active), and third order issue (configfs shutdown while second order
is active) have to tested..?

How many total volumes, nodes are you using..?

> > Once the test is completed, perform an explicit fabric logout and target
> > shutdown.  This is to ensure there are no reference leaks, kernel hung
> > tasks in un-interruptible sleep, or other memory leaks that have been
> > introduced by the series.
> 
> I run all my tests against a kernel with kmemleak enabled. That's a very
> effective way for detecting memory leaks.

Can you give an idea of how many dedicated QA resources that Sandisk is
putting on upstream target development..?

Do you have production customers using it..?


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 8, 2017, 8:14 p.m. UTC | #6
On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote:
> Adding a second atomic_t emulating a kref on top of the existing
> se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
> path in the normal fast-path is not going to be acceptable.

I don't see why that approach would not be acceptable. My patches remove
more atomic operations than the number of new atomic operations that are
introduced in the hot path so these patches will improve performance.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 8, 2017, 10:49 p.m. UTC | #7
On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote:
> On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote:
> > Adding a second atomic_t emulating a kref on top of the existing
> > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
> > path in the normal fast-path is not going to be acceptable.
> 
> I don't see why that approach would not be acceptable. My patches remove
> more atomic operations than the number of new atomic operations that are
> introduced in the hot path so these patches will improve performance.

You've got less than 24 hours of soak time after I highlighted a rather
fundamental flaw in the v2 approach to the first order issue of how
ABORT_TASK and LUN_RESET function when backend I/O is taking an extended
amount of time to complete.

This tells me your entire manual testing of these code-path thus far
been rather useless.

Have you tested the second and third order issues at scale yet to ensure
these changes don't introduce regressions, vs. what we know is stable in
mainline..?  If so, how many LUNs, nodes, and how many occurrences of
each order issue..?

The point is you need to remove the target patches from linux-next that
don't have reviews and answer to and understand the feedback I've given
you to think about, considering it's taken years to get this particular
area of the code stable across the drivers in mainline.

Beyond that, if you want to make larger changes you need to really
consider getting dedicated QA resources and/or automation setup from
your employer to test them for extended period of times, in stressful
conditions.

Trying to manually test these large changes at small scale in your spare
time, efficiently avoiding the interesting code-paths until the obvious
flaws where pointed out on the list is unacceptable.





--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 8, 2017, 11:04 p.m. UTC | #8
On Wed, 2017-02-08 at 14:49 -0800, Nicholas A. Bellinger wrote:
> On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote:
> > On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote:
> > > Adding a second atomic_t emulating a kref on top of the existing
> > > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
> > > path in the normal fast-path is not going to be acceptable.
> > 
> > I don't see why that approach would not be acceptable. My patches remove
> > more atomic operations than the number of new atomic operations that are
> > introduced in the hot path so these patches will improve performance.
> 
> You've got less than 24 hours of soak time [ ... ]

You are irritating me, and it's not very smart to do that. Anyway, a good
kernel developer develops tests to reproduce race conditions in much less time
than 24 hours.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 8, 2017, 11:09 p.m. UTC | #9
On Wed, 2017-02-08 at 23:04 +0000, Bart Van Assche wrote:
> On Wed, 2017-02-08 at 14:49 -0800, Nicholas A. Bellinger wrote:
> > On Wed, 2017-02-08 at 20:14 +0000, Bart Van Assche wrote:
> > > On Wed, 2017-02-08 at 11:06 -0800, Nicholas A. Bellinger wrote:
> > > > Adding a second atomic_t emulating a kref on top of the existing
> > > > se_cmd->cmd_kref, and then doing a complete_all() for the special TMR
> > > > path in the normal fast-path is not going to be acceptable.
> > > 
> > > I don't see why that approach would not be acceptable. My patches remove
> > > more atomic operations than the number of new atomic operations that are
> > > introduced in the hot path so these patches will improve performance.
> > 
> > You've got less than 24 hours of soak time [ ... ]
> 
> You are irritating me, and it's not very smart to do that. Anyway, a good
> kernel developer develops tests to reproduce race conditions in much less time
> than 24 hours.

Listen, my primary concern is the integrity of the code-paths that has
taken us years to get stable.

Based upon your actions, it looks like your primary concern is getting
large untested and unreviewed changes that effect complex code-paths
into linux-next, and into mainline.

This, for a codebase that you don't have dedicated QA resources or
automation assigned from your employer, nor production customers to
answer to.

So if you want to be a LIO maintainer, you'll need to starting answering
to the LIO community for your actions.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 8, 2017, 11:15 p.m. UTC | #10
On Wed, 2017-02-08 at 15:09 -0800, Nicholas A. Bellinger wrote:
> Listen, my primary concern is the integrity of the code-paths that has
> taken us years to get stable.

If that would be true you would have taken this patch series upstream two
years ago, when I posted this patch series for the first time. This patch
series namely makes the SCSI target core much more reliable and much easier
to maintain. However, instead of helping to make this patch series make
progress two years ago you chose to obstruct this work and to increase the
complexity of the target core further. You still have a chance to show that
reliability and maintainability of the target core is your primary concern
by posting review comments for v4 of the patch series I posted.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 8, 2017, 11:43 p.m. UTC | #11
On Wed, 2017-02-08 at 23:15 +0000, Bart Van Assche wrote:
> On Wed, 2017-02-08 at 15:09 -0800, Nicholas A. Bellinger wrote:
> > Listen, my primary concern is the integrity of the code-paths that has
> > taken us years to get stable.
> 
> If that would be true you would have taken this patch series upstream two
> years ago, when I posted this patch series for the first time.

Actually, I responded with an explanation of why it was the wrong
approach back then, and different issues to consider when touching this
particular area of code:

http://www.spinics.net/lists/target-devel/msg11057.html

And this was your response:

http://www.spinics.net/lists/target-devel/msg11542.html

Your response disregarded the feedback, and focused on a trivial comment
and ignored all others of substance.  This is why I didn't merge your
patch.

So I did the same thing again yesterday.  When through the patch line by
line, explaining the first, second and third order issues to consider
when touching this complex code-path.

The only difference was yesterday it was shown the patch could have
never worked due to a fundamental flaw in the approach, and that the
interesting code-paths in question where never actually hit with your
manual testing.

>  This patch
> series namely makes the SCSI target core much more reliable and much easier
> to maintain. 

Making it 'much more readable' at the cost of stability for something
that you've been unable to verify with QA regression or automation
testing is not acceptable.

Like I said, you'll need to start being alot more methodical about
incremental patches to complex code-paths, and the testing approach if
you want to make larger changes to this area in mainline.

You've got like 5 patches for spelling fixes, and only 1 for the large
TMR change..?  Really..?

> However, instead of helping to make this patch series make
> progress two years ago you chose to obstruct this work and to increase the
> complexity of the target core further. You still have a chance to show that
> reliability and maintainability of the target core is your primary concern
> by posting review comments for v4 of the patch series I posted.
> 

If you continue to ignore feedback and are unwilling or unable to
throughly QA regression test changes in complex areas that have taken
years to get working across the tree, don't expect a different outcome
this time around either.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche Feb. 8, 2017, 11:48 p.m. UTC | #12
On Wed, 2017-02-08 at 15:43 -0800, Nicholas A. Bellinger wrote:
> [ ... ] complex areas that have taken years to get working across the tree [ ... ]

This comment by itself shows that the target core needs to be simplified.
There is only one reason why it took so long to improve stability of the
SCSI target core, namely that it is too complex.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger Feb. 9, 2017, 12:05 a.m. UTC | #13
On Wed, 2017-02-08 at 23:48 +0000, Bart Van Assche wrote:
> On Wed, 2017-02-08 at 15:43 -0800, Nicholas A. Bellinger wrote:
> > [ ... ] complex areas that have taken years to get working across the tree [ ... ]
> 
> This comment by itself shows that the target core needs to be simplified.
> There is only one reason why it took so long to improve stability of the
> SCSI target core, namely that it is too complex.

Ignore and snip all you want, but it doesn't change the fact you've been
putting un-tested and un-reviewed junk into linux-next.

That will need to stop immediately.

If your employer is willing to make a commitment for QA and/or
automation resources to regression test (at scale) these type of complex
target changes, then I'd be much less concerned about things.

If that is not the case, and you've got no production customers for LIO
to answer to, and only doing small scale manual regression testing in
your spare time, then the risk of regressions for your 'cleanups' in
complex areas will continue to be an issue.





--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index 9ab7090f7c83..6f6cae81ab8e 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -136,7 +136,6 @@  int	init_se_kmem_caches(void);
 void	release_se_kmem_caches(void);
 u32	scsi_get_new_index(scsi_index_t);
 void	transport_subsystem_check_init(void);
-void	transport_cmd_finish_abort(struct se_cmd *, int);
 unsigned char *transport_dump_cmd_direction(struct se_cmd *);
 void	transport_dump_dev_state(struct se_device *, char *, int *);
 void	transport_dump_dev_info(struct se_device *, struct se_lun *,
@@ -146,7 +145,6 @@  int	transport_dump_vpd_assoc(struct t10_vpd *, unsigned char *, int);
 int	transport_dump_vpd_ident_type(struct t10_vpd *, unsigned char *, int);
 int	transport_dump_vpd_ident(struct t10_vpd *, unsigned char *, int);
 void	transport_clear_lun_ref(struct se_lun *);
-void	transport_send_task_abort(struct se_cmd *);
 sense_reason_t	target_cmd_size_check(struct se_cmd *cmd, unsigned int size);
 void	target_qf_do_work(struct work_struct *work);
 bool	target_check_wce(struct se_device *dev);
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 367799b4dde1..bc07e059fb22 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -75,25 +75,6 @@  void core_tmr_release_req(struct se_tmr_req *tmr)
 	kfree(tmr);
 }
 
-static void core_tmr_handle_tas_abort(struct se_cmd *cmd, int tas)
-{
-	unsigned long flags;
-	bool remove = true, send_tas;
-	/*
-	 * TASK ABORTED status (TAS) bit support
-	 */
-	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	send_tas = (cmd->transport_state & CMD_T_TAS);
-	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-
-	if (send_tas) {
-		remove = false;
-		transport_send_task_abort(cmd);
-	}
-
-	transport_cmd_finish_abort(cmd, remove);
-}
-
 static int target_check_cdb_and_preempt(struct list_head *list,
 		struct se_cmd *cmd)
 {
@@ -192,13 +173,13 @@  void core_tmr_abort_task(
 			continue;
 		}
 
-		list_del_init(&se_cmd->se_cmd_list);
+		se_cmd->send_abort_response = false;
 		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
 
-		cancel_work_sync(&se_cmd->work);
-		transport_wait_for_tasks(se_cmd);
+		while (!wait_for_completion_timeout(&se_cmd->finished, 180 * HZ))
+			pr_debug("ABORT TASK: still waiting for ITT %#llx\n",
+				 ref_tag);
 
-		transport_cmd_finish_abort(se_cmd, true);
 		target_put_sess_cmd(se_cmd);
 
 		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
@@ -295,14 +276,31 @@  static void core_tmr_drain_tmr_list(
 			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
 			tmr_p->function, tmr_p->response, cmd->t_state);
 
-		cancel_work_sync(&cmd->work);
-		transport_wait_for_tasks(cmd);
-
-		transport_cmd_finish_abort(cmd, 1);
+		while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ))
+			pr_debug("LUN RESET: still waiting for ITT %#llx\n",
+				 cmd->tag);
 		target_put_sess_cmd(cmd);
 	}
 }
 
+/**
+ * core_tmr_drain_state_list() - Abort SCSI commands associated with a device.
+ *
+ * @dev:       Device for which to abort outstanding SCSI commands.
+ * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called
+ *             to realize the PREEMPT AND ABORT functionality.
+ * @tmr_sess:  Session through which the LUN RESET has been received.
+ * @tas:       Task Aborted Status (TAS) bit from the SCSI control mode page.
+ *             A quote from SPC-4, paragraph "7.5.10 Control mode page":
+ *             "A task aborted status (TAS) bit set to zero specifies that
+ *             aborted commands shall be terminated by the device server
+ *             without any response to the application client. A TAS bit set
+ *             to one specifies that commands aborted by the actions of an I_T
+ *             nexus other than the I_T nexus on which the command was
+ *             received shall be completed with TASK ABORTED status."
+ * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list
+ *             with registrations that will be preempted.
+ */
 static void core_tmr_drain_state_list(
 	struct se_device *dev,
 	struct se_cmd *prout_cmd,
@@ -311,6 +309,7 @@  static void core_tmr_drain_state_list(
 	struct list_head *preempt_and_abort_list)
 {
 	LIST_HEAD(drain_task_list);
+	struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL;
 	struct se_session *sess;
 	struct se_cmd *cmd, *next;
 	unsigned long flags;
@@ -340,6 +339,10 @@  static void core_tmr_drain_state_list(
 	 */
 	spin_lock_irqsave(&dev->execute_task_lock, flags);
 	list_for_each_entry_safe(cmd, next, &dev->state_list, state_list) {
+		/* Skip task management functions. */
+		if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
+			continue;
+
 		/*
 		 * For PREEMPT_AND_ABORT usage, only process commands
 		 * with a matching reservation key.
@@ -365,6 +368,8 @@  static void core_tmr_drain_state_list(
 
 		list_move_tail(&cmd->state_list, &drain_task_list);
 		cmd->state_active = false;
+		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
+			cmd->send_abort_response = true;
 	}
 	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
 
@@ -387,17 +392,9 @@  static void core_tmr_drain_state_list(
 			(cmd->transport_state & CMD_T_STOP) != 0,
 			(cmd->transport_state & CMD_T_SENT) != 0);
 
-		/*
-		 * If the command may be queued onto a workqueue cancel it now.
-		 *
-		 * This is equivalent to removal from the execute queue in the
-		 * loop above, but we do it down here given that
-		 * cancel_work_sync may block.
-		 */
-		cancel_work_sync(&cmd->work);
-		transport_wait_for_tasks(cmd);
-
-		core_tmr_handle_tas_abort(cmd, tas);
+		while (!wait_for_completion_timeout(&cmd->finished, 180 * HZ))
+			pr_debug("LUN RESET: still waiting for cmd with ITT %#llx\n",
+				 cmd->tag);
 		target_put_sess_cmd(cmd);
 	}
 }
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 40dff4799984..8506ecc0d6c0 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -650,25 +650,6 @@  static void transport_lun_remove_cmd(struct se_cmd *cmd)
 		percpu_ref_put(&lun->lun_ref);
 }
 
-void transport_cmd_finish_abort(struct se_cmd *cmd, int remove)
-{
-	bool ack_kref = (cmd->se_cmd_flags & SCF_ACK_KREF);
-
-	if (cmd->se_cmd_flags & SCF_SE_LUN_CMD)
-		transport_lun_remove_cmd(cmd);
-	/*
-	 * Allow the fabric driver to unmap any resources before
-	 * releasing the descriptor via TFO->release_cmd()
-	 */
-	if (remove)
-		cmd->se_tfo->aborted_task(cmd);
-
-	if (transport_cmd_check_stop_to_fabric(cmd))
-		return;
-	if (remove && ack_kref)
-		transport_put_cmd(cmd);
-}
-
 static void target_complete_failure_work(struct work_struct *work)
 {
 	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
@@ -700,14 +681,58 @@  static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd)
 	return cmd->sense_buffer;
 }
 
+static void transport_handle_abort(struct se_cmd *cmd)
+{
+	bool ack_kref = cmd->se_cmd_flags & SCF_ACK_KREF;
+
+	transport_lun_remove_cmd(cmd);
+
+	if (cmd->send_abort_response) {
+		cmd->scsi_status = SAM_STAT_TASK_ABORTED;
+		pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
+			 cmd->t_task_cdb[0], cmd->tag);
+		trace_target_cmd_complete(cmd);
+		cmd->se_tfo->queue_status(cmd);
+		transport_cmd_check_stop_to_fabric(cmd);
+	} else {
+		/*
+		 * Allow the fabric driver to unmap any resources before
+		 * releasing the descriptor via TFO->release_cmd()
+		 */
+		cmd->se_tfo->aborted_task(cmd);
+		/*
+		 * To do: establish a unit attention condition on the I_T
+		 * nexus associated with cmd. See also the paragraph "Aborting
+		 * commands" in SAM.
+		 */
+		if (transport_cmd_check_stop_to_fabric(cmd) == 0 && ack_kref)
+			transport_put_cmd(cmd);
+	}
+}
+
+static void target_abort_work(struct work_struct *work)
+{
+	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
+
+	transport_handle_abort(cmd);
+}
+
+/* May be called from interrupt context so must not sleep. */
 void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 {
 	struct se_device *dev = cmd->se_dev;
 	int success = scsi_status == GOOD;
 	unsigned long flags;
 
-	cmd->scsi_status = scsi_status;
+	if (cmd->transport_state & CMD_T_ABORTED) {
+		INIT_WORK(&cmd->work, target_abort_work);
+		goto queue_work;
+	} else if (cmd->transport_state & CMD_T_STOP) {
+		complete_all(&cmd->t_transport_stop_comp);
+		return;
+	}
 
+	cmd->scsi_status = scsi_status;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
 	cmd->transport_state &= ~CMD_T_BUSY;
@@ -720,16 +745,7 @@  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 			success = 1;
 	}
 
-	/*
-	 * Check for case where an explicit ABORT_TASK has been received
-	 * and transport_wait_for_tasks() will be waiting for completion..
-	 */
-	if (cmd->transport_state & CMD_T_ABORTED ||
-	    cmd->transport_state & CMD_T_STOP) {
-		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-		complete_all(&cmd->t_transport_stop_comp);
-		return;
-	} else if (!success) {
+	if (!success) {
 		INIT_WORK(&cmd->work, target_complete_failure_work);
 	} else {
 		INIT_WORK(&cmd->work, target_complete_ok_work);
@@ -739,6 +755,7 @@  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
+queue_work:
 	if (cmd->se_cmd_flags & SCF_USE_CPUID)
 		queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
 	else
@@ -1221,6 +1238,7 @@  void transport_init_se_cmd(
 	INIT_LIST_HEAD(&cmd->state_list);
 	init_completion(&cmd->t_transport_stop_comp);
 	init_completion(&cmd->cmd_wait_comp);
+	init_completion(&cmd->finished);
 	spin_lock_init(&cmd->t_state_lock);
 	kref_init(&cmd->cmd_kref);
 	cmd->transport_state = CMD_T_DEV_ACTIVE;
@@ -1883,16 +1901,18 @@  static int __transport_check_aborted_status(struct se_cmd *, int);
 void target_execute_cmd(struct se_cmd *cmd)
 {
 	/*
+	 * If the received CDB has aleady been aborted stop processing it here.
+	 */
+	if (transport_check_aborted_status(cmd, 1) != 0)
+		return;
+
+	/*
 	 * Determine if frontend context caller is requesting the stopping of
 	 * this command for frontend exceptions.
 	 *
 	 * If the received CDB has aleady been aborted stop processing it here.
 	 */
 	spin_lock_irq(&cmd->t_state_lock);
-	if (__transport_check_aborted_status(cmd, 1)) {
-		spin_unlock_irq(&cmd->t_state_lock);
-		return;
-	}
 	if (cmd->transport_state & CMD_T_STOP) {
 		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
 			__func__, __LINE__, cmd->tag);
@@ -2499,15 +2519,11 @@  static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
 
 int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 {
-	int ret = 0;
 	bool aborted = false, tas = false;
 
 	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
 		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
 			target_wait_free_cmd(cmd, &aborted, &tas);
-
-		if (!aborted || tas)
-			ret = transport_put_cmd(cmd);
 	} else {
 		if (wait_for_tasks)
 			target_wait_free_cmd(cmd, &aborted, &tas);
@@ -2521,9 +2537,6 @@  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 
 		if (cmd->se_lun)
 			transport_lun_remove_cmd(cmd);
-
-		if (!aborted || tas)
-			ret = transport_put_cmd(cmd);
 	}
 	/*
 	 * If the task has been internally aborted due to TMR ABORT_TASK
@@ -2534,10 +2547,8 @@  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 	if (aborted) {
 		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
 		wait_for_completion(&cmd->cmd_wait_comp);
-		cmd->se_tfo->release_cmd(cmd);
-		ret = 1;
 	}
-	return ret;
+	return transport_put_cmd(cmd);
 }
 EXPORT_SYMBOL(transport_generic_free_cmd);
 
@@ -2596,6 +2607,8 @@  static void target_release_cmd_kref(struct kref *kref)
 	unsigned long flags;
 	bool fabric_stop;
 
+	complete_all(&se_cmd->finished);
+
 	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
 
 	spin_lock(&se_cmd->t_state_lock);
@@ -2697,7 +2710,7 @@  void target_wait_for_sess_cmds(struct se_session *se_sess)
 			" fabric state: %d\n", se_cmd, se_cmd->t_state,
 			se_cmd->se_tfo->get_cmd_state(se_cmd));
 
-		se_cmd->se_tfo->release_cmd(se_cmd);
+		target_put_sess_cmd(se_cmd);
 	}
 
 	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
@@ -3009,16 +3022,7 @@  static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 		return 1;
 	}
 
-	pr_debug("Sending delayed SAM_STAT_TASK_ABORTED status for CDB:"
-		" 0x%02x ITT: 0x%08llx\n", cmd->t_task_cdb[0], cmd->tag);
-
 	cmd->se_cmd_flags &= ~SCF_SEND_DELAYED_TAS;
-	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
-	trace_target_cmd_complete(cmd);
-
-	spin_unlock_irq(&cmd->t_state_lock);
-	cmd->se_tfo->queue_status(cmd);
-	spin_lock_irq(&cmd->t_state_lock);
 
 	return 1;
 }
@@ -3035,47 +3039,6 @@  int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 }
 EXPORT_SYMBOL(transport_check_aborted_status);
 
-void transport_send_task_abort(struct se_cmd *cmd)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
-		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-		return;
-	}
-	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-
-	/*
-	 * If there are still expected incoming fabric WRITEs, we wait
-	 * until until they have completed before sending a TASK_ABORTED
-	 * response.  This response with TASK_ABORTED status will be
-	 * queued back to fabric module by transport_check_aborted_status().
-	 */
-	if (cmd->data_direction == DMA_TO_DEVICE) {
-		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
-			spin_lock_irqsave(&cmd->t_state_lock, flags);
-			if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) {
-				spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-				goto send_abort;
-			}
-			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
-			spin_unlock_irqrestore(&cmd->t_state_lock, flags);
-			return;
-		}
-	}
-send_abort:
-	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
-
-	transport_lun_remove_cmd(cmd);
-
-	pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
-		 cmd->t_task_cdb[0], cmd->tag);
-
-	trace_target_cmd_complete(cmd);
-	cmd->se_tfo->queue_status(cmd);
-}
-
 static void target_tmr_work(struct work_struct *work)
 {
 	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index fb87f0e5d0d6..e690ce36e592 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -444,6 +444,7 @@  struct se_cmd {
 	unsigned		cmd_wait_set:1;
 	unsigned		unknown_data_length:1;
 	bool			state_active:1;
+	bool			send_abort_response:1;
 	u64			tag; /* SAM command identifier aka task tag */
 	/* Delay for ALUA Active/NonOptimized state access in milliseconds */
 	int			alua_nonop_delay;
@@ -496,6 +497,7 @@  struct se_cmd {
 	spinlock_t		t_state_lock;
 	struct kref		cmd_kref;
 	struct completion	t_transport_stop_comp;
+	struct completion	finished;
 
 	struct work_struct	work;