[v2] target: core: remove from tmr_list at lun unlink

Message ID	20210416092146.3201-1-d.bogdanov@yadro.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <target-devel-owner@kernel.org> From: Dmitry Bogdanov <d.bogdanov@yadro.com> To: Martin Petersen <martin.petersen@oracle.com>, <target-devel@vger.kernel.org> CC: <linux-scsi@vger.kernel.org>, <linux@yadro.com>, Dmitry Bogdanov <d.bogdanov@yadro.com>, Roman Bolshakov <r.bolshakov@yadro.com> Subject: [PATCH v2] target: core: remove from tmr_list at lun unlink Date: Fri, 16 Apr 2021 12:21:46 +0300 Message-ID: <20210416092146.3201-1-d.bogdanov@yadro.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	[v2] target: core: remove from tmr_list at lun unlink \| expand [v2] target: core: remove from tmr_list at lun unlink

Dmitry Bogdanov April 16, 2021, 9:21 a.m. UTC

Currently TMF commands are removed from de_device.dev_tmf_list at
the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
up on a command status (response) is queued in transport layer.
It means that LUN and backend device can be deleted meantime and at
the moment of repsonse completion a panic is occured:

target_tmr_work()
	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
- // - // - // -
<<<--- lun remove
<<<--- core backend device remove
- // - // - // -
qlt_handle_abts_completion()
  tfo->free_mcmd()
    transport_generic_free_cmd()
      target_put_sess_cmd()
        core_tmr_release_req() {
          if (dev) { // backend device, can not be null
            spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH

Call Trace:
NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
Call Trace:
(unreliable)
0x0
target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
process_one_work+0x2c4/0x5c0
worker_thread+0x88/0x690

For FC protocol it is a race condition, but for iSCSI protocol it is
easyly reproduced by manual sending iSCSI commands:
- Send some SCSI sommand
- Send Abort of that command over iSCSI
- Remove LUN on target
- Send next iSCSI command to acknowledge the Abort_Response
- target panics

There is no sense to keep the command in tmr_list until response
completion, so move the removal from tmr_list from the response
completion to the response queueing when lun is unlinked.
Move the removal from state list too as it is a subject to the same
race condition.

Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>

---
v2:
 fix cmd stuck in tmr list in error case in iscsi
 move clearing cmd->se_lun to transport_lun_remove_cmd

The issue exists from the very begining.
I uploaded a scapy script that helps to reproduce the issue at
https://gist.github.com/logost/cb93df41dd2432454324449b390403c4
---
 drivers/target/iscsi/iscsi_target.c    |  2 +-
 drivers/target/target_core_tmr.c       |  9 --------
 drivers/target/target_core_transport.c | 29 +++++++++++++++++++-------
 3 files changed, 23 insertions(+), 17 deletions(-)

Mike Christie April 16, 2021, 9:38 p.m. UTC | #1

On 4/16/21 4:21 AM, Dmitry Bogdanov wrote:
> Currently TMF commands are removed from de_device.dev_tmf_list at
> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
> up on a command status (response) is queued in transport layer.
> It means that LUN and backend device can be deleted meantime and at
> the moment of repsonse completion a panic is occured:
> 
> target_tmr_work()
> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
> - // - // - // -
> <<<--- lun remove
> <<<--- core backend device remove
> - // - // - // -
> qlt_handle_abts_completion()
>   tfo->free_mcmd()
>     transport_generic_free_cmd()
>       target_put_sess_cmd()
>         core_tmr_release_req() {
>           if (dev) { // backend device, can not be null
>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
> 

Sorry about this. I was thinking about this patch some more while
reviewing this version.

Is there another race possible?

1. We have a cmd running in lio.
2. The initiator times it out and sends tmf.
3. cmd completes.
4. target_submit_tmr has called transport_lookup_tmr_lun.
5. transport_lookup_tmr_lun has set se_dev and tmr_dev.
6. You now remove the lun and device.
7. Now you crash on the dev references.

I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun
like is done in transport_lookup_cmd_lun. If successful set

se_cmd->lun_ref_active = true;

If we get the ref, we drop it in transport_lun_remove_cmd like with
the non tmr case.

Combined with this patch then I think we are ok for all races.

Mike Christie April 16, 2021, 9:41 p.m. UTC | #2

On 4/16/21 4:38 PM, Mike Christie wrote:
> On 4/16/21 4:21 AM, Dmitry Bogdanov wrote:
>> Currently TMF commands are removed from de_device.dev_tmf_list at
>> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
>> up on a command status (response) is queued in transport layer.
>> It means that LUN and backend device can be deleted meantime and at
>> the moment of repsonse completion a panic is occured:
>>
>> target_tmr_work()
>> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
>> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
>> - // - // - // -
>> <<<--- lun remove
>> <<<--- core backend device remove
>> - // - // - // -
>> qlt_handle_abts_completion()
>>   tfo->free_mcmd()
>>     transport_generic_free_cmd()
>>       target_put_sess_cmd()
>>         core_tmr_release_req() {
>>           if (dev) { // backend device, can not be null
>>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
>>
> 
> Sorry about this. I was thinking about this patch some more while
> reviewing this version.
> 
> Is there another race possible?
> 
> 1. We have a cmd running in lio.
> 2. The initiator times it out and sends tmf.
> 3. cmd completes.
> 4. target_submit_tmr has called transport_lookup_tmr_lun.
> 5. transport_lookup_tmr_lun has set se_dev and tmr_dev.
> 6. You now remove the lun and device.
> 7. Now you crash on the dev references.
> 
> I think we need to do a percpu_ref_tryget_live in transport_lookup_tmr_lun
> like is done in transport_lookup_cmd_lun. If successful set
> 
> se_cmd->lun_ref_active = true;
> 
> If we get the ref, we drop it in transport_lun_remove_cmd like with
> the non tmr case.
> 

Ignore this. I see we do a percpu_ref_tryget_live in there already.

Mike Christie April 16, 2021, 10:39 p.m. UTC | #3

On 4/16/21 4:21 AM, Dmitry Bogdanov wrote:
> Currently TMF commands are removed from de_device.dev_tmf_list at
> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
> up on a command status (response) is queued in transport layer.
> It means that LUN and backend device can be deleted meantime and at
> the moment of repsonse completion a panic is occured:
> 
> target_tmr_work()
> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
> - // - // - // -
> <<<--- lun remove
> <<<--- core backend device remove
> - // - // - // -
> qlt_handle_abts_completion()
>   tfo->free_mcmd()
>     transport_generic_free_cmd()
>       target_put_sess_cmd()
>         core_tmr_release_req() {
>           if (dev) { // backend device, can not be null
>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
> 
> Call Trace:
> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
> Call Trace:
> (unreliable)
> 0x0
> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
> process_one_work+0x2c4/0x5c0
> worker_thread+0x88/0x690
> 
> For FC protocol it is a race condition, but for iSCSI protocol it is
> easyly reproduced by manual sending iSCSI commands:
> - Send some SCSI sommand
> - Send Abort of that command over iSCSI
> - Remove LUN on target
> - Send next iSCSI command to acknowledge the Abort_Response
> - target panics
> 
> There is no sense to keep the command in tmr_list until response
> completion, so move the removal from tmr_list from the response
> completion to the response queueing when lun is unlinked.
> Move the removal from state list too as it is a subject to the same
> race condition.
> 
> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> 
> ---
> v2:
>  fix cmd stuck in tmr list in error case in iscsi
>  move clearing cmd->se_lun to transport_lun_remove_cmd
> 
> The issue exists from the very begining.
> I uploaded a scapy script that helps to reproduce the issue at
> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ 
> ---
>  drivers/target/iscsi/iscsi_target.c    |  2 +-
>  drivers/target/target_core_tmr.c       |  9 --------
>  drivers/target/target_core_transport.c | 29 +++++++++++++++++++-------
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index d0e7ed8f28cc..3311b031a812 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
>  	 * TMR TASK_REASSIGN.
>  	 */
>  	iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
> -	target_put_sess_cmd(&cmd->se_cmd);
> +	transport_generic_free_cmd(&cmd->se_cmd, false);
>  	return 0;
>  }

Doh. I see how I got all confused. I guess this path leaks the lun_ref
taken by transport_lookup_tmr_lun. It looks like an old issue and nothing
to do with your patch.

I'm not sure if we are supposed to be calling transport_generic_free_cmd
twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL"
in transport_lun_remove_cmd, so we won't do a double list deletion.
It feels dirty though. I can feel Bart saying, "Mike did you see the comment
at the top of the function". :)

Maybe it's best to more cleanly unwind what was setup in the rror path. I think we
can fix lun_ref leak too.

So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd
to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?

Mike Christie April 16, 2021, 10:50 p.m. UTC | #4

On 4/16/21 5:39 PM, Mike Christie wrote:
> On 4/16/21 4:21 AM, Dmitry Bogdanov wrote:
>> Currently TMF commands are removed from de_device.dev_tmf_list at
>> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
>> up on a command status (response) is queued in transport layer.
>> It means that LUN and backend device can be deleted meantime and at
>> the moment of repsonse completion a panic is occured:
>>
>> target_tmr_work()
>> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
>> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
>> - // - // - // -
>> <<<--- lun remove
>> <<<--- core backend device remove
>> - // - // - // -
>> qlt_handle_abts_completion()
>>   tfo->free_mcmd()
>>     transport_generic_free_cmd()
>>       target_put_sess_cmd()
>>         core_tmr_release_req() {
>>           if (dev) { // backend device, can not be null
>>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
>>
>> Call Trace:
>> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
>> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
>> Call Trace:
>> (unreliable)
>> 0x0
>> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
>> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
>> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
>> process_one_work+0x2c4/0x5c0
>> worker_thread+0x88/0x690
>>
>> For FC protocol it is a race condition, but for iSCSI protocol it is
>> easyly reproduced by manual sending iSCSI commands:
>> - Send some SCSI sommand
>> - Send Abort of that command over iSCSI
>> - Remove LUN on target
>> - Send next iSCSI command to acknowledge the Abort_Response
>> - target panics
>>
>> There is no sense to keep the command in tmr_list until response
>> completion, so move the removal from tmr_list from the response
>> completion to the response queueing when lun is unlinked.
>> Move the removal from state list too as it is a subject to the same
>> race condition.
>>
>> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
>> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
>> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
>>
>> ---
>> v2:
>>  fix cmd stuck in tmr list in error case in iscsi
>>  move clearing cmd->se_lun to transport_lun_remove_cmd
>>
>> The issue exists from the very begining.
>> I uploaded a scapy script that helps to reproduce the issue at
>> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ 
>> ---
>>  drivers/target/iscsi/iscsi_target.c    |  2 +-
>>  drivers/target/target_core_tmr.c       |  9 --------
>>  drivers/target/target_core_transport.c | 29 +++++++++++++++++++-------
>>  3 files changed, 23 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
>> index d0e7ed8f28cc..3311b031a812 100644
>> --- a/drivers/target/iscsi/iscsi_target.c
>> +++ b/drivers/target/iscsi/iscsi_target.c
>> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
>>  	 * TMR TASK_REASSIGN.
>>  	 */
>>  	iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
>> -	target_put_sess_cmd(&cmd->se_cmd);
>> +	transport_generic_free_cmd(&cmd->se_cmd, false);
>>  	return 0;
>>  }
> 
> Doh. I see how I got all confused. I guess this path leaks the lun_ref
> taken by transport_lookup_tmr_lun. It looks like an old issue and nothing
> to do with your patch.
> 
> I'm not sure if we are supposed to be calling transport_generic_free_cmd
> twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL"
> in transport_lun_remove_cmd, so we won't do a double list deletion.
> It feels dirty though. I can feel Bart saying, "Mike did you see the comment
> at the top of the function". :)
> 
> Maybe it's best to more cleanly unwind what was setup in the rror path. I think we
> can fix lun_ref leak too.
> 
> So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd
> to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?

Shoot. I'm all over the place. I think the root issue is my original comment on the v1
patch was wrong.

On a failure we would still do:

iscsit_free_cmd -> transport_generic_free_cmd -> transport_lun_remove_cmd

right? So we don't need any change in the iscsi target. It should all
just work.

Konstantin Shelekhin April 17, 2021, 9:39 a.m. UTC | #5

On Fri, Apr 16, 2021 at 12:21:46PM +0300, Dmitry Bogdanov wrote:
> @@ -719,8 +726,16 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
>  	if (!lun)
>  		return;
>  
> +	target_remove_from_state_list(cmd);
> +	target_remove_from_tmr_list(cmd);
> +
>  	if (cmpxchg(&cmd->lun_ref_active, true, false))
>  		percpu_ref_put(&lun->lun_ref);
> +
> +	/*
> +	 * Clear struct se_cmd->se_lun before the handoff to FE.
> +	 */
> +	cmd->se_lun = NULL;
>  }

Sadly we just found out that this code is racing with
core_tmr_drain_tmr_list(). If LUN RESET comes in while there are still
some outstanding ABORT TASK functions left, the following sequence is
possible:

  1. During LUN RESET processing core_tmr_drain_tmr_list() is called
  2. During ABORT TASK processing transport_lun_remove_cmd() is called
     at the sime time
  3. core_tmr_drain_tmr_list() acquires &dev->se_tmr_lock lock and moves
     TMRs to the on-stack drain_tmr_list
  4. core_tmr_drain_tmr_list() releases &dev->se_tmr_lock and starts
     working on the drain_tmr_list
  5. At the same moment target_remove_from_tmr_list() is called
  6. It acquires &dev->se_tmr_lock, removes TMR from the list by
     list_del_init() and releases &dev->se_tmr_lock

What happens next is this:

  [  391.438244] LUN_RESET:  releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11
  [  391.438246] LUN_RESET:  releasing TMR 00000000e2ee2634 Function: 0x01, Response: 0x05, t_state: 11

The same TMR is being pulled out twice out of the drain_tmr_list. This
happens because there are no locks that prevent the list traversal in
core_tmr_drain_tmr_list() and the list element removal in
target_remove_from_tmr_list() from being executed concurrently. So
list_del_init() in target_remove_from_tmr_list() calls INIT_LIST_HEAD()
and tmr_p->next now points to tmr_p.

Hence the following warnings:

  [  391.438300] WARNING: CPU: 12 PID: 20064 at ../drivers/target/target_core_transport.c:2785
  ...
  [  391.438448] WARNING: CPU: 12 PID: 20064 at ../lib/refcount.c:28 refcount_warn_saturate+0x224/0x240

This issue also prevents other TMRs from being released, resulting in a
stuck session. Not always, since sometimes drain_tmr_list only contains
one element, but still possible.

Dmitry Bogdanov April 21, 2021, 3:07 p.m. UTC | #6

Hi Mike,

>>> --- a/drivers/target/iscsi/iscsi_target.c
>>> +++ b/drivers/target/iscsi/iscsi_target.c
>>> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
>>>  	 * TMR TASK_REASSIGN.
>>>  	 */
>>>  	iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
>>> -	target_put_sess_cmd(&cmd->se_cmd);
>>> +	transport_generic_free_cmd(&cmd->se_cmd, false);
>>>  	return 0;
>>>  }
>> 
>> Doh. I see how I got all confused. I guess this path leaks the lun_ref 
>> taken by transport_lookup_tmr_lun. It looks like an old issue and 
>> nothing to do with your patch.

>> I'm not sure if we are supposed to be calling 
>> transport_generic_free_cmd twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL"
>> in transport_lun_remove_cmd, so we won't do a double list deletion.
>> It feels dirty though. I can feel Bart saying, "Mike did you see the 
>> comment at the top of the function". :)
>> 
>> Maybe it's best to more cleanly unwind what was setup in the rror 
>> path. I think we can fix lun_ref leak too.
>> 
>> So instead of doing transport_generic_free_cmd above do 
>> transport_lun_remove_cmd to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?
>
>Shoot. I'm all over the place. I think the root issue is my original comment on the v1 patch was wrong.
>On a failure we would still do:
>iscsit_free_cmd -> transport_generic_free_cmd -> transport_lun_remove_cmd
>right? So we don't need any change in the iscsi target. It should all just work.

iscsit_free_cmd will be called only at response completion(next incoming iscsi command): iscsit_ack_from_expstatsn -> iscsit_free_cmd.
That produces some unacceptable delay of lun unlinking from cmd. There is a bug report to the similar behavior:
http://lkml.iu.edu/hypermail/linux/kernel/2002.0/05272.html
Because of that complain, the commit 83f85b8ec305, that solves the same crash as I am fixing,  was reverted.

So, this piece of patch has some indirect relation :)
I will extract it to a separate patch in the coming patchset on TMF handling.

BR,
 Dmitry

[v2] target: core: remove from tmr_list at lun unlink

Commit Message

Comments

Patch