[1/2] iscsi-target: Fix TMR reference leak during session shutdown
diff mbox

Message ID 1490862594-2712-2-git-send-email-nab@linux-iscsi.org
State New, archived
Headers show

Commit Message

Nicholas A. Bellinger March 30, 2017, 8:29 a.m. UTC
From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch fixes a iscsi-target specific TMR reference leak
during session shutdown, that could occur when a TMR was
quiesced before the hand-off back to iscsi-target code
via transport_cmd_check_stop_to_fabric().

The reference leak happens because iscsit_free_cmd() was
incorrectly skipping the final target_put_sess_cmd() for
TMRs when transport_generic_free_cmd() returned zero because
the se_cmd->cmd_kref did not reach zero, due to the missing
se_cmd assignment in original code.

The result was iscsi_cmd and it's associated se_cmd memory
would be freed once se_sess->sess_cmd_map where released,
but the associated se_tmr_req was leaked and remained part
of se_device->dev_tmr_list.

This bug would manfiest itself as kernel paging request
OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req
attempted to dereference it's se_cmd pointer that had
already been released during normal session shutdown.

To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD
and ISCSI_OP_SCSI_TMFUNC the same when there is an extra
se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use
op_scsi to signal __iscsit_free_cmd() when the former
needs to clear any further iscsi related I/O state.

Reported-by: Rob Millner <rlm@daterainc.com>
Cc: Rob Millner <rlm@daterainc.com>
Reported-by: Chu Yuan Lin <cyl@datera.io>
Cc: Chu Yuan Lin <cyl@datera.io>
Tested-by: Chu Yuan Lin <cyl@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/target/iscsi/iscsi_target_util.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Bart Van Assche March 30, 2017, 5:08 p.m. UTC | #1
On Thu, 2017-03-30 at 08:29 +0000, Nicholas A. Bellinger wrote:
> diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> index 5041a9c..b464033 100644
> --- a/drivers/target/iscsi/iscsi_target_util.c
> +++ b/drivers/target/iscsi/iscsi_target_util.c
> @@ -737,21 +737,23 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
>  {
>  	struct se_cmd *se_cmd = NULL;
>  	int rc;
> +	bool op_scsi = false;
>  	/*
>  	 * Determine if a struct se_cmd is associated with
>  	 * this struct iscsi_cmd.
>  	 */
>  	switch (cmd->iscsi_opcode) {
>  	case ISCSI_OP_SCSI_CMD:
> -		se_cmd = &cmd->se_cmd;
> -		__iscsit_free_cmd(cmd, true, shutdown);
> +		op_scsi = true;
>  		/*
>  		 * Fallthrough
>  		 */
>  	case ISCSI_OP_SCSI_TMFUNC:
> -		rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown);
> -		if (!rc && shutdown && se_cmd && se_cmd->se_sess) {
> -			__iscsit_free_cmd(cmd, true, shutdown);
> +		se_cmd = &cmd->se_cmd;
> +		__iscsit_free_cmd(cmd, op_scsi, shutdown);
> +		rc = transport_generic_free_cmd(se_cmd, shutdown);
> +		if (!rc && shutdown && se_cmd->se_sess) {
> +			__iscsit_free_cmd(cmd, op_scsi, shutdown);
>  			target_put_sess_cmd(se_cmd);
>  		}
>  		break;

Hello Nic,

I agree that this patch fixes a leak. However, an existing bug in
iscsit_free_cmd() is not addressed by this patch. Before the TMF code started
using kref_get() / kref_put() it was possible for transport_generic_free_cmd()
to determine whether or not iscsit_free_cmd() should call __iscsit_free_cmd()
by checking the command reference count. I think that since the TMF code
manipulates the command reference count it is no longer possible for
transport_generic_free_cmd() to determine this. If iscsit_free_cmd() is called
while a LUN RESET is in progress then the return value of
transport_generic_free_cmd() will be wrong. I will post a few patches that not
only address what I just described but also the leak fixed by this patch.

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger April 2, 2017, 10:38 p.m. UTC | #2
On Thu, 2017-03-30 at 17:08 +0000, Bart Van Assche wrote:
> On Thu, 2017-03-30 at 08:29 +0000, Nicholas A. Bellinger wrote:
> > diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
> > index 5041a9c..b464033 100644
> > --- a/drivers/target/iscsi/iscsi_target_util.c
> > +++ b/drivers/target/iscsi/iscsi_target_util.c
> > @@ -737,21 +737,23 @@ void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
> >  {
> >  	struct se_cmd *se_cmd = NULL;
> >  	int rc;
> > +	bool op_scsi = false;
> >  	/*
> >  	 * Determine if a struct se_cmd is associated with
> >  	 * this struct iscsi_cmd.
> >  	 */
> >  	switch (cmd->iscsi_opcode) {
> >  	case ISCSI_OP_SCSI_CMD:
> > -		se_cmd = &cmd->se_cmd;
> > -		__iscsit_free_cmd(cmd, true, shutdown);
> > +		op_scsi = true;
> >  		/*
> >  		 * Fallthrough
> >  		 */
> >  	case ISCSI_OP_SCSI_TMFUNC:
> > -		rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown);
> > -		if (!rc && shutdown && se_cmd && se_cmd->se_sess) {
> > -			__iscsit_free_cmd(cmd, true, shutdown);
> > +		se_cmd = &cmd->se_cmd;
> > +		__iscsit_free_cmd(cmd, op_scsi, shutdown);
> > +		rc = transport_generic_free_cmd(se_cmd, shutdown);
> > +		if (!rc && shutdown && se_cmd->se_sess) {
> > +			__iscsit_free_cmd(cmd, op_scsi, shutdown);
> >  			target_put_sess_cmd(se_cmd);
> >  		}
> >  		break;
> 
> Hello Nic,
> 
> I agree that this patch fixes a leak. However, an existing bug in
> iscsit_free_cmd() is not addressed by this patch. Before the TMF code started
> using kref_get() / kref_put() it was possible for transport_generic_free_cmd()
> to determine whether or not iscsit_free_cmd() should call __iscsit_free_cmd()
> by checking the command reference count. I think that since the TMF code
> manipulates the command reference count it is no longer possible for
> transport_generic_free_cmd() to determine this. If iscsit_free_cmd() is called
> while a LUN RESET is in progress then the return value of
> transport_generic_free_cmd() will be wrong. 

No.  Your assumption is incorrect wrt transport_generic_free_cmd()
having a wrong return value during LUN_RESET.

It's incorrect because when iscsit_free_cmd() is called with
shutdown=true resulting in transport_generic_free_cmd() with
wait_for_tasks=true, target_wait_free_cmd() checks for CMD_T_ABORTED to
determine if se_cmd has been aborted by target_core_tmr.c logic, and
returns aborted=true back up to transport_generic_free_cmd().

When transport_generic_free_cmd() gets aborted=true, it waits for
se_cmd->cmd_wait_comp to finish and calls cmd->se_tfo->release_cmd()
to release se_cmd back to the pre-allocated session pool.

At this point, transport_generic_free_cmd() with aborted=true must
return '1' it's caller, signaling iscsit_free_cmd() to not attempt to
perform the extra __iscsi_free_cmd() or target_put_sess_cmd(), because
se_cmd has already been released back to the session pool.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 17, 2017, 9:33 p.m. UTC | #3
On Sun, 2017-04-02 at 15:38 -0700, Nicholas A. Bellinger wrote:
> No.  Your assumption is incorrect wrt transport_generic_free_cmd()
> having a wrong return value during LUN_RESET.
> 
> It's incorrect because when iscsit_free_cmd() is called with
> shutdown=true resulting in transport_generic_free_cmd() with
> wait_for_tasks=true, target_wait_free_cmd() checks for CMD_T_ABORTED to
> determine if se_cmd has been aborted by target_core_tmr.c logic, and
> returns aborted=true back up to transport_generic_free_cmd().
> 
> When transport_generic_free_cmd() gets aborted=true, it waits for
> se_cmd->cmd_wait_comp to finish and calls cmd->se_tfo->release_cmd()
> to release se_cmd back to the pre-allocated session pool.
> 
> At this point, transport_generic_free_cmd() with aborted=true must
> return '1' it's caller, signaling iscsit_free_cmd() to not attempt to
> perform the extra __iscsi_free_cmd() or target_put_sess_cmd(), because
> se_cmd has already been released back to the session pool.

Hello Nic,

Today the iSCSI target driver relies on transport_generic_free_cmd() to figure
out whether target_put_sess_cmd() has been called once or twice for a SCSI
command. transport_generic_free_cmd() relies on the command reference count and
the "aborted" flag to perform that check. I think that's an unfortunate design
choice because it makes it impossible to use the command reference count outside
any context where it is used today (calling .release_cmd() upon final put / TMF
handling), e.g. to simplify the implementation of target_wait_for_sess_cmds().
This choice also makes it necessary to perform direct calls to .release_cmd()
from several functions, something that is really unfortunate. I think it is
important to make all that code easier to understand and to verify. Hence my
proposal to introduce a second reference count that tracks the number of
references a target driver holds on a SCSI command. Such a patch has been posted
before. See also "target: Introduce a target driver reference count per command"
(February 9, 2017, https://www.spinics.net/lists/target-devel/msg14487.html). Is
it OK for you if I rework my most recently posted iSCSI target driver patch
series on top of that patch?

Bart.--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger May 2, 2017, 4:38 a.m. UTC | #4
On Mon, 2017-04-17 at 21:33 +0000, Bart Van Assche wrote:
> On Sun, 2017-04-02 at 15:38 -0700, Nicholas A. Bellinger wrote:
> > No.  Your assumption is incorrect wrt transport_generic_free_cmd()
> > having a wrong return value during LUN_RESET.
> > 
> > It's incorrect because when iscsit_free_cmd() is called with
> > shutdown=true resulting in transport_generic_free_cmd() with
> > wait_for_tasks=true, target_wait_free_cmd() checks for CMD_T_ABORTED to
> > determine if se_cmd has been aborted by target_core_tmr.c logic, and
> > returns aborted=true back up to transport_generic_free_cmd().
> > 
> > When transport_generic_free_cmd() gets aborted=true, it waits for
> > se_cmd->cmd_wait_comp to finish and calls cmd->se_tfo->release_cmd()
> > to release se_cmd back to the pre-allocated session pool.
> > 
> > At this point, transport_generic_free_cmd() with aborted=true must
> > return '1' it's caller, signaling iscsit_free_cmd() to not attempt to
> > perform the extra __iscsi_free_cmd() or target_put_sess_cmd(), because
> > se_cmd has already been released back to the session pool.
> 
> Hello Nic,
> 
> Today the iSCSI target driver relies on transport_generic_free_cmd() to figure
> out whether target_put_sess_cmd() has been called once or twice for a SCSI
> command. transport_generic_free_cmd() relies on the command reference count and
> the "aborted" flag to perform that check. I think that's an unfortunate design
> choice because it makes it impossible to use the command reference count outside
> any context where it is used today (calling .release_cmd() upon final put / TMF
> handling), e.g. to simplify the implementation of target_wait_for_sess_cmds().
> This choice also makes it necessary to perform direct calls to .release_cmd()
> from several functions, something that is really unfortunate. I think it is
> important to make all that code easier to understand and to verify. Hence my
> proposal to introduce a second reference count that tracks the number of
> references a target driver holds on a SCSI command. Such a patch has been posted
> before. See also "target: Introduce a target driver reference count per command"
> (February 9, 2017, https://www.spinics.net/lists/target-devel/msg14487.html). Is
> it OK for you if I rework my most recently posted iSCSI target driver patch
> series on top of that patch?

There is no valid purpose to introduce a second se_cmd kref, and you've
not been able to articulate a reason for doing so other than 'it makes
the code easier to read'.

So no, unless there is a specific bug that absolutely cannot be
addressed without adding a second se_cmd kref, I'll not be considering
these patches for reasons I've mentioned numerous times already.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 5041a9c..b464033 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -737,21 +737,23 @@  void iscsit_free_cmd(struct iscsi_cmd *cmd, bool shutdown)
 {
 	struct se_cmd *se_cmd = NULL;
 	int rc;
+	bool op_scsi = false;
 	/*
 	 * Determine if a struct se_cmd is associated with
 	 * this struct iscsi_cmd.
 	 */
 	switch (cmd->iscsi_opcode) {
 	case ISCSI_OP_SCSI_CMD:
-		se_cmd = &cmd->se_cmd;
-		__iscsit_free_cmd(cmd, true, shutdown);
+		op_scsi = true;
 		/*
 		 * Fallthrough
 		 */
 	case ISCSI_OP_SCSI_TMFUNC:
-		rc = transport_generic_free_cmd(&cmd->se_cmd, shutdown);
-		if (!rc && shutdown && se_cmd && se_cmd->se_sess) {
-			__iscsit_free_cmd(cmd, true, shutdown);
+		se_cmd = &cmd->se_cmd;
+		__iscsit_free_cmd(cmd, op_scsi, shutdown);
+		rc = transport_generic_free_cmd(se_cmd, shutdown);
+		if (!rc && shutdown && se_cmd->se_sess) {
+			__iscsit_free_cmd(cmd, op_scsi, shutdown);
 			target_put_sess_cmd(se_cmd);
 		}
 		break;