[03/19] target: Avoid that aborting a command sporadically hangs
diff mbox

Message ID 20170504225102.8931-4-bart.vanassche@sandisk.com
State New, archived
Headers show

Commit Message

Bart Van Assche May 4, 2017, 10:50 p.m. UTC
For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
transport_generic_free_cmd() causes RDMA completion processing to stall.
Hence only sleep inside this function if the (iSCSI) target driver
requires this.

This patch avoids that messages similar to the following appear in the
kernel log:

INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
      Tainted: G        W       4.10.0-rc7-dbg+ #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u25:0   D    0  1013      2 0x00000000
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Call Trace:
 __schedule+0x2da/0xb00
 schedule+0x38/0x90
 schedule_timeout+0x2fe/0x640
 wait_for_completion+0xfe/0x160
 transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
 srpt_send_done+0x59/0x9f [ib_srpt]
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_cq_poll_work+0x1b/0x60 [ib_core]
 process_one_work+0x208/0x6a0
 worker_thread+0x49/0x4a0
 kthread+0x107/0x140
 ret_from_fork+0x2e/0x40

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_transport.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Hannes Reinecke May 5, 2017, 6:12 a.m. UTC | #1
On 05/05/2017 12:50 AM, Bart Van Assche wrote:
> For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
> transport_generic_free_cmd() causes RDMA completion processing to stall.
> Hence only sleep inside this function if the (iSCSI) target driver
> requires this.
> 
> This patch avoids that messages similar to the following appear in the
> kernel log:
> 
> INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
>       Tainted: G        W       4.10.0-rc7-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u25:0   D    0  1013      2 0x00000000
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> Call Trace:
>  __schedule+0x2da/0xb00
>  schedule+0x38/0x90
>  schedule_timeout+0x2fe/0x640
>  wait_for_completion+0xfe/0x160
>  transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
>  srpt_send_done+0x59/0x9f [ib_srpt]
>  __ib_process_cq+0x4b/0xd0 [ib_core]
>  ib_cq_poll_work+0x1b/0x60 [ib_core]
>  process_one_work+0x208/0x6a0
>  worker_thread+0x49/0x4a0
>  kthread+0x107/0x140
>  ret_from_fork+0x2e/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Grover <agrover@redhat.com>
> Cc: David Disseldorp <ddiss@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/target/target_core_transport.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
hch@lst.de May 5, 2017, 8:53 a.m. UTC | #2
On Thu, May 04, 2017 at 03:50:46PM -0700, Bart Van Assche wrote:
> For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
> transport_generic_free_cmd() causes RDMA completion processing to stall.
> Hence only sleep inside this function if the (iSCSI) target driver
> requires this.

This looks reasonable to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

But as a further step can we try to move the waiting behavior entirely
into the caller that actually cares,

e.g. move the conditional target_wait_free_cmd before the call
to transport_generic_free_cmd in transport_generic_free_cmd,
and move this second wait_for_completion after the
transport_generic_free_cmd call based on an indicator (return value
or se_cmd flag)?
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 5, 2017, 3 p.m. UTC | #3
On Fri, 2017-05-05 at 10:53 +0200, Christoph Hellwig wrote:
> But as a further step can we try to move the waiting behavior entirely
> into the caller that actually cares,
> 
> e.g. move the conditional target_wait_free_cmd before the call
> to transport_generic_free_cmd in transport_generic_free_cmd,
> and move this second wait_for_completion after the
> transport_generic_free_cmd call based on an indicator (return value
> or se_cmd flag)?

Hello Christoph,

That sounds like a good idea to me, especially since there are only three
target drivers that set the "wait_for_tasks" argument to true, namely
tcm_loop, the iSCSI target driver and the xen-scsiback driver.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger May 7, 2017, 10:20 p.m. UTC | #4
On Thu, 2017-05-04 at 15:50 -0700, Bart Van Assche wrote:
> For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
> transport_generic_free_cmd() causes RDMA completion processing to stall.
> Hence only sleep inside this function if the (iSCSI) target driver
> requires this.
> 
> This patch avoids that messages similar to the following appear in the
> kernel log:
> 
> INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
>       Tainted: G        W       4.10.0-rc7-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u25:0   D    0  1013      2 0x00000000
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> Call Trace:
>  __schedule+0x2da/0xb00
>  schedule+0x38/0x90
>  schedule_timeout+0x2fe/0x640
>  wait_for_completion+0xfe/0x160
>  transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
>  srpt_send_done+0x59/0x9f [ib_srpt]
>  __ib_process_cq+0x4b/0xd0 [ib_core]
>  ib_cq_poll_work+0x1b/0x60 [ib_core]
>  process_one_work+0x208/0x6a0
>  worker_thread+0x49/0x4a0
>  kthread+0x107/0x140
>  ret_from_fork+0x2e/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Grover <agrover@redhat.com>
> Cc: David Disseldorp <ddiss@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/target/target_core_transport.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 37f57357d4a0..df1ceb2dd110 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2553,7 +2553,8 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	 */
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
> -		wait_for_completion(&cmd->cmd_wait_comp);
> +		if (wait_for_tasks)
> +			wait_for_completion(&cmd->cmd_wait_comp);
>  		cmd->se_tfo->release_cmd(cmd);
>  		ret = 1;
>  	}

Grr, we have already been through this once before, remember..?

http://www.spinics.net/lists/target-devel/msg14598.html

To repeat, aborted = true is only set when transport_generic_free_cmd()
is called with wait_for_tasks = true, and neither srpt nor isert have
ever invoked transport_generic_free_cmd() with wait_for_tasks = true in
upstream code:

drivers/infiniband/ulp/isert/ib_isert.c:		transport_generic_free_cmd(&cmd->se_cmd, 0);
drivers/infiniband/ulp/isert/ib_isert.c:		transport_generic_free_cmd(&cmd->se_cmd, 0);
drivers/infiniband/ulp/isert/ib_isert.c:			transport_generic_free_cmd(&cmd->se_cmd, 0);
drivers/infiniband/ulp/srpt/ib_srpt.c:		transport_generic_free_cmd(&ioctx->cmd, 0);
drivers/infiniband/ulp/srpt/ib_srpt.c:		transport_generic_free_cmd(&ioctx->cmd, 0);
drivers/infiniband/ulp/srpt/ib_srpt.c:		transport_generic_free_cmd(&ioctx->cmd, 0);

And even if they did, this patch still would be a NOP and not make any
difference !

So it's clear what you did here, once upon a time you came up with a
bogus patch to address breakage for you own stuff in out-of-tree code,
and then copy-and-pasted the old backtrace from the out-of-tree bug and
posted it as an fix here, for a scenario that can't ever possibly occur
in upstream.

Really, please stop sending this type of garbage as it further erodes
what little trust I have left.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche May 8, 2017, 9:25 p.m. UTC | #5
On Sun, 2017-05-07 at 15:20 -0700, Nicholas A. Bellinger wrote:
> aborted = true is only set when transport_generic_free_cmd()
> is called with wait_for_tasks = true, and neither srpt nor isert have
> ever invoked transport_generic_free_cmd() with wait_for_tasks = true in
> upstream code.

Right. Although this patch is not wrong, this patch is not needed with the
current transport_generic_free_cmd() implementation but has to be folded in
the patch that eliminates the "aborted" and "tas" variables (which is not
in the series I posted).

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger May 10, 2017, 4:48 a.m. UTC | #6
On Mon, 2017-05-08 at 21:25 +0000, Bart Van Assche wrote:
> On Sun, 2017-05-07 at 15:20 -0700, Nicholas A. Bellinger wrote:
> > aborted = true is only set when transport_generic_free_cmd()
> > is called with wait_for_tasks = true, and neither srpt nor isert have
> > ever invoked transport_generic_free_cmd() with wait_for_tasks = true in
> > upstream code.
> 
> Right. Although this patch is not wrong, this patch is not needed with the
> current transport_generic_free_cmd() implementation but has to be folded in
> the patch that eliminates the "aborted" and "tas" variables (which is not
> in the series I posted).

I don't give a toss if the patch is logically a NOP, and 'is not wrong'.

The commit message was completely bogus, and the backtrace was from some
random out-of-tree junk that was obviously not tested against the
current target-pending/for-next.

And to top it off, I've already pointed out these facts three months
ago, and you just ignored the response.

The point is, it's sloppy and means I have to repeat myself over and
over again.

Not to mention in that very same thread from three months ago, I asked
(again( to stop mixing bug-fixes with random improvements:

http://www.spinics.net/lists/target-devel/msg14665.html

"To reiterate the importance of having bug-fixes, especially those
intended for stable, always be leading other patches..

There is no way for a maintainer to know which bug-fixes are to existing
code unless they precede all other patches and not intermixed with
various other changes.  The bug-fixes to existing upstream code need to
be tested on their own without the other changes (especially those that
effect the same area) invalidating the tests."

How many more times will I have to repeat myself before it sinks in..?

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 37f57357d4a0..df1ceb2dd110 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2553,7 +2553,8 @@  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 	 */
 	if (aborted) {
 		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
-		wait_for_completion(&cmd->cmd_wait_comp);
+		if (wait_for_tasks)
+			wait_for_completion(&cmd->cmd_wait_comp);
 		cmd->se_tfo->release_cmd(cmd);
 		ret = 1;
 	}