[3/3] tcmu: remove cmd timeout code

Message ID	1488431681-15481-4-git-send-email-mchristi@redhat.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <target-devel-owner@kernel.org> From: Mike Christie <mchristi@redhat.com> To: target-devel@vger.kernel.org, nab@linux-iscsi.org Cc: Mike Christie <mchristi@redhat.com> Subject: [PATCH 3/3] tcmu: remove cmd timeout code Date: Wed, 1 Mar 2017 23:14:41 -0600 Message-Id: <1488431681-15481-4-git-send-email-mchristi@redhat.com> In-Reply-To: <1488431681-15481-1-git-send-email-mchristi@redhat.com> References: <1488431681-15481-1-git-send-email-mchristi@redhat.com> Sender: target-devel-owner@vger.kernel.org Precedence: bulk

Mike Christie March 2, 2017, 5:14 a.m. UTC

This patch removes the tcmu the cmd timeout code. Like with the core
target_core_tmr.c code, the kernel does not know the state of the command
or implemenation details of the device, so it might not be safe to just
return the command to the initiator for a possible retry.

Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 drivers/target/target_core_user.c | 83 +--------------------------------------
 1 file changed, 1 insertion(+), 82 deletions(-)

Nicholas A. Bellinger March 8, 2017, 9:05 a.m. UTC | #1

Hi MNC,

I know Andy has stepped back from TCMU, but I wanted to get his input
(CC'ed) as the original author.

On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote:
> This patch removes the tcmu the cmd timeout code. Like with the core
> target_core_tmr.c code, the kernel does not know the state of the command
> or implemenation details of the device, so it might not be safe to just
> return the command to the initiator for a possible retry.
> 
> Signed-off-by: Mike Christie <mchristi@redhat.com>

Applied for the moment, but I'm curious wrt the implications for this..?

Does this mean all outstanding se_cmd descriptors dispatched into
user-space must always complete outstanding I/Os, otherwise target-core
will block indefinitely waiting for completion a la other in-kernel
backend drivers..? 

Historically we've always considered a backend device that never
completes outstanding se_cmd descriptors to be a buggy driver.
But existing user-space code that uses TCMU (I assume) is allowed to
restart, and is currently not responsible for completing I/Os after the
restart..?  Is that the case..?

I wonder if this change may end up being problematic because of this
assumption by user-space..?

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Christie March 8, 2017, 6:02 p.m. UTC | #2

On 03/08/2017 03:05 AM, Nicholas A. Bellinger wrote:
> Hi MNC,
> 
> I know Andy has stepped back from TCMU, but I wanted to get his input
> (CC'ed) as the original author.
> 
> On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote:
>> This patch removes the tcmu the cmd timeout code. Like with the core
>> target_core_tmr.c code, the kernel does not know the state of the command
>> or implemenation details of the device, so it might not be safe to just
>> return the command to the initiator for a possible retry.
>>
>> Signed-off-by: Mike Christie <mchristi@redhat.com>
> 
> Applied for the moment, but I'm curious wrt the implications for this..?
> 
> Does this mean all outstanding se_cmd descriptors dispatched into
> user-space must always complete outstanding I/Os, otherwise target-core
> will block indefinitely waiting for completion a la other in-kernel
> backend drivers..? 

Yes.

> 
> Historically we've always considered a backend device that never
> completes outstanding se_cmd descriptors to be a buggy driver.
> But existing user-space code that uses TCMU (I assume) is allowed to
> restart, and is currently not responsible for completing I/Os after the
> restart..?  Is that the case..?
> 
> I wonder if this change may end up being problematic because of this
> assumption by user-space..?

I think we need a module param or per device flag.

There is this kernel timeout code that handles the easy part where we
can get se_cmd descriptors completed.

There is no tcmu-runner restart code to figure out the state of those
commands. For example if we were in the WRITE part of a
COMPARE_AND_WRITE and the WRITE is floating around at the time of the
tcmu-runner crash, then when it starts up again, we would just allow 2
COMPARE_AND_WRITEs to be executing at the same time.

However, I see your point that other daemons could be implementing the
needed cleanup and be safe.

How about a modparam that specifies the timeout length. If set to 0,
then we do not timeout internally. tcmu-runner would set this. The
default is 30 like it is currently.

My tcmu-runner plan was:

1. For hung commands, we are going to add callouts to the backend module
for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
to have the backend perform the TMF. target_core_user would then call
into tcmu-runner and that would call into the userspace handler.

2. For tcmu-runner restarts, I am working on the ability to block
backends, so at restart time tcmur-runner can loop over the oustanding
commands and call into handlers to figure out if they are running still.
If they are, I am debating what to do still. We could have tcmu runner
will call the handler callouts used for #1 if needed.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Grover March 8, 2017, 7:45 p.m. UTC | #3

On 03/08/2017 10:02 AM, Mike Christie wrote:
> On 03/08/2017 03:05 AM, Nicholas A. Bellinger wrote:
>> Hi MNC,
>>
>> I know Andy has stepped back from TCMU, but I wanted to get his input
>> (CC'ed) as the original author.

Sure. I'm in agreement with Mike on what needs to happen, but I can 
maybe add some comments from a "historical" TCMU angle.

>> On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote:
>>> This patch removes the tcmu the cmd timeout code. Like with the core
>>> target_core_tmr.c code, the kernel does not know the state of the command
>>> or implemenation details of the device, so it might not be safe to just
>>> return the command to the initiator for a possible retry.
>>>
>>> Signed-off-by: Mike Christie <mchristi@redhat.com>
>>
>> Applied for the moment, but I'm curious wrt the implications for this..?
>>
>> Does this mean all outstanding se_cmd descriptors dispatched into
>> user-space must always complete outstanding I/Os, otherwise target-core
>> will block indefinitely waiting for completion a la other in-kernel
>> backend drivers..?
>
> Yes.

Yeah. originally having this timeout was not correct. The initiator is 
responsible for aborting (via a TMF) and retrying commands, and 
intermediate layers like TCMU having their own timeouts is a bad idea.

>> Historically we've always considered a backend device that never
>> completes outstanding se_cmd descriptors to be a buggy driver.
>> But existing user-space code that uses TCMU (I assume) is allowed to
>> restart, and is currently not responsible for completing I/Os after the
>> restart..?  Is that the case..?
>>
>> I wonder if this change may end up being problematic because of this
>> assumption by user-space..?
>
> I think we need a module param or per device flag.
>
> There is this kernel timeout code that handles the easy part where we
> can get se_cmd descriptors completed.
>
> There is no tcmu-runner restart code to figure out the state of those
> commands. For example if we were in the WRITE part of a
> COMPARE_AND_WRITE and the WRITE is floating around at the time of the
> tcmu-runner crash, then when it star
> then we do not timeout internally. tcmu-runner would set this. The
> default is 30 like it is currently.
>
> My tcmu-runner plan was:
>
> 1. For hung commands, we are going to add callouts to the backend module
> for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
> to have the backend perform the TMF. target_core_user would then call
> into tcmu-runner and that would call into the userspace handler.
>
> 2. For tcmu-runner restarts, I am working on the ability to block
> backends, so at restart time tcmur-runner can loop over the oustanding
> commands and call into handlers to figure out if they are running still.
> If they are, I am debating what to do still. We could have tcmu runner
> will call the handler callouts used for #1 if needed.

Originally, the shared memory area kind of served as a recoverable 
journal of cmds issued to userspace: if userspace died then they'd still 
be there when userspace restarted and re-mmap()ed the area, and it could 
pick up and continue.

However this stopped being the case when we enabled cmds to be completed 
out of order. We may reuse an entry A on the TCMU cmd ring to return 
status for cmd B, which means that if userspace chooses that moment to 
crash then a restarted userspace will not know it still needs to do A 
(as well as probably doing B twice). Userspace must assume the burden of 
remembering its state, or refuse to work on cmds in the cmd ring when 
restarted. Mike's plan #1 is important to clear out these hung commands, 
and then #2, because tcmu-runner has its own internal modularization 
("handlers") and maybe tcmu-runner doesn't even know the state of 
in-flight cmds, but a handler does. Fun.

Anyway, I don't know if I'm adding any new info here but just adding my 2c.

Regards -- Andy

p.s. just thinking here... it would, with some work, be possible to add 
a separate cmd completion ring backwards-compatibly. Haven't thought it 
through but just tossing that out there.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nicholas A. Bellinger March 9, 2017, 4:16 a.m. UTC | #4

On Wed, 2017-03-08 at 12:02 -0600, Mike Christie wrote:
> On 03/08/2017 03:05 AM, Nicholas A. Bellinger wrote:
> > Hi MNC,
> > 
> > I know Andy has stepped back from TCMU, but I wanted to get his input
> > (CC'ed) as the original author.
> > 
> > On Wed, 2017-03-01 at 23:14 -0600, Mike Christie wrote:
> >> This patch removes the tcmu the cmd timeout code. Like with the core
> >> target_core_tmr.c code, the kernel does not know the state of the command
> >> or implemenation details of the device, so it might not be safe to just
> >> return the command to the initiator for a possible retry.
> >>
> >> Signed-off-by: Mike Christie <mchristi@redhat.com>
> > 
> > Applied for the moment, but I'm curious wrt the implications for this..?
> > 
> > Does this mean all outstanding se_cmd descriptors dispatched into
> > user-space must always complete outstanding I/Os, otherwise target-core
> > will block indefinitely waiting for completion a la other in-kernel
> > backend drivers..? 
> 
> Yes.
> 
> > 
> > Historically we've always considered a backend device that never
> > completes outstanding se_cmd descriptors to be a buggy driver.
> > But existing user-space code that uses TCMU (I assume) is allowed to
> > restart, and is currently not responsible for completing I/Os after the
> > restart..?  Is that the case..?
> > 
> > I wonder if this change may end up being problematic because of this
> > assumption by user-space..?
> 
> I think we need a module param or per device flag.
> 
> There is this kernel timeout code that handles the easy part where we
> can get se_cmd descriptors completed.
> 
> There is no tcmu-runner restart code to figure out the state of those
> commands. For example if we were in the WRITE part of a
> COMPARE_AND_WRITE and the WRITE is floating around at the time of the
> tcmu-runner crash, then when it starts up again, we would just allow 2
> COMPARE_AND_WRITEs to be executing at the same time.
> 
> However, I see your point that other daemons could be implementing the
> needed cleanup and be safe.
> 
> How about a modparam that specifies the timeout length. If set to 0,
> then we do not timeout internally. tcmu-runner would set this. The
> default is 30 like it is currently.

Makes sense for existing compatibility.

Even better than modparam, add a TCMU specific configfs attribute via
tcmu_ops->tb_dev_attrib_attrs and expose it per device.

> 
> My tcmu-runner plan was:
> 
> 1. For hung commands, we are going to add callouts to the backend module
> for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
> to have the backend perform the TMF. target_core_user would then call
> into tcmu-runner and that would call into the userspace handler.
> 

Mmmm.  This would mean TMRs would function very different between TCMU
and non TCMU backends..

We've always had the assumption that once a TMR is processed, code
blocks until the backend completes outstanding se_cmd(s) in question
without an explicit cancellation trigger.

One approach I've taken for a out-of-tree make_request_fn() based bio
driver is return bios with -EAGAIN if a bio takes more than say 3
seconds to complete, in order to signal to IBLOCK to propagate up a
SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL.  This ends up working quite well
to keep initiators happy, and virtually eliminates spurious TMRs in host
environments (like ESX for example) with a very low SCSI timeout. 

Of course, this pushes alot of smarts into the driver below the backend,
but in a distributed scale out backend where I/Os delays, et al. are a
fact of life this is difficult to avoid.

So that said, I'd prefer to have TCMU's user-space code return back I/Os
that are expected to take a long time with SAM_STAT_BUSY or
SAM_STAT_QUEUE_FULL (to reduce host queue_depth), instead of adding a
explicit se_cmd cancellation mechanism in target-core into backend
driver code.

I'd be happy to post the target-core changes needed to do this, which
where posted at one point but ended up not getting merged. 

> 2. For tcmu-runner restarts, I am working on the ability to block
> backends, so at restart time tcmur-runner can loop over the oustanding
> commands and call into handlers to figure out if they are running still.
> If they are, I am debating what to do still. We could have tcmu runner
> will call the handler callouts used for #1 if needed.
> 

Yeah, returning back with SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL would be
good idea here too.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nicholas A. Bellinger March 9, 2017, 4:22 a.m. UTC | #5

On Wed, 2017-03-08 at 20:16 -0800, Nicholas A. Bellinger wrote:
> On Wed, 2017-03-08 at 12:02 -0600, Mike Christie wrote:

<SNIP>

> > 
> > My tcmu-runner plan was:
> > 
> > 1. For hung commands, we are going to add callouts to the backend module
> > for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
> > to have the backend perform the TMF. target_core_user would then call
> > into tcmu-runner and that would call into the userspace handler.
> > 
> 
> Mmmm.  This would mean TMRs would function very different between TCMU
> and non TCMU backends..
> 
> We've always had the assumption that once a TMR is processed, code
> blocks until the backend completes outstanding se_cmd(s) in question
> without an explicit cancellation trigger.
> 
> One approach I've taken for a out-of-tree make_request_fn() based bio
> driver is return bios with -EAGAIN if a bio takes more than say 3
> seconds to complete, in order to signal to IBLOCK to propagate up a
> SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL.  This ends up working quite well
> to keep initiators happy, and virtually eliminates spurious TMRs in host
> environments (like ESX for example) with a very low SCSI timeout. 
> 
> Of course, this pushes alot of smarts into the driver below the backend,
> but in a distributed scale out backend where I/Os delays, et al. are a
> fact of life this is difficult to avoid.
> 
> So that said, I'd prefer to have TCMU's user-space code return back I/Os
> that are expected to take a long time with SAM_STAT_BUSY or
> SAM_STAT_QUEUE_FULL (to reduce host queue_depth), instead of adding a
> explicit se_cmd cancellation mechanism in target-core into backend
> driver code.
> 
> I'd be happy to post the target-core changes needed to do this, which
> where posted at one point but ended up not getting merged. 

Btw, here is the patch in question:

http://www.spinics.net/lists/target-devel/msg11835.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mike Christie March 9, 2017, 8:53 a.m. UTC | #6

On 03/08/2017 10:22 PM, Nicholas A. Bellinger wrote:
> On Wed, 2017-03-08 at 20:16 -0800, Nicholas A. Bellinger wrote:
>> On Wed, 2017-03-08 at 12:02 -0600, Mike Christie wrote:
> 
> <SNIP>
> 
>>>
>>> My tcmu-runner plan was:
>>>
>>> 1. For hung commands, we are going to add callouts to the backend module
>>> for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
>>> to have the backend perform the TMF. target_core_user would then call
>>> into tcmu-runner and that would call into the userspace handler.
>>>
>>
>> Mmmm.  This would mean TMRs would function very different between TCMU
>> and non TCMU backends..

Do we ever want support for that? I actually started implementing
support so lio backends like iblock could call into the initiator
driver's error handlers:

http://www.spinics.net/lists/linux-scsi/msg97064.html

For tcmu, your idea below works nicely. I will work on it. Thanks.

>>
>> We've always had the assumption that once a TMR is processed, code
>> blocks until the backend completes outstanding se_cmd(s) in question
>> without an explicit cancellation trigger.
>>
>> One approach I've taken for a out-of-tree make_request_fn() based bio
>> driver is return bios with -EAGAIN if a bio takes more than say 3
>> seconds to complete, in order to signal to IBLOCK to propagate up a
>> SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL.  This ends up working quite well
>> to keep initiators happy, and virtually eliminates spurious TMRs in host
>> environments (like ESX for example) with a very low SCSI timeout. 
>>
>> Of course, this pushes alot of smarts into the driver below the backend,
>> but in a distributed scale out backend where I/Os delays, et al. are a
>> fact of life this is difficult to avoid.
>>
>> So that said, I'd prefer to have TCMU's user-space code return back I/Os
>> that are expected to take a long time with SAM_STAT_BUSY or
>> SAM_STAT_QUEUE_FULL (to reduce host queue_depth), instead of adding a
>> explicit se_cmd cancellation mechanism in target-core into backend
>> driver code.
>>
>> I'd be happy to post the target-core changes needed to do this, which
>> where posted at one point but ended up not getting merged. 
> 
> Btw, here is the patch in question:
> 
> http://www.spinics.net/lists/target-devel/msg11835.html
>
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nicholas A. Bellinger March 19, 2017, 1:51 a.m. UTC | #7

On Thu, 2017-03-09 at 02:53 -0600, Mike Christie wrote:
> On 03/08/2017 10:22 PM, Nicholas A. Bellinger wrote:
> > On Wed, 2017-03-08 at 20:16 -0800, Nicholas A. Bellinger wrote:
> >> On Wed, 2017-03-08 at 12:02 -0600, Mike Christie wrote:
> > 
> > <SNIP>
> > 
> >>>
> >>> My tcmu-runner plan was:
> >>>
> >>> 1. For hung commands, we are going to add callouts to the backend module
> >>> for TMFs. target_core_tmr.c would call them during TMFs like ABORT_TASK
> >>> to have the backend perform the TMF. target_core_user would then call
> >>> into tcmu-runner and that would call into the userspace handler.
> >>>
> >>
> >> Mmmm.  This would mean TMRs would function very different between TCMU
> >> and non TCMU backends..
> 
> Do we ever want support for that? I actually started implementing
> support so lio backends like iblock could call into the initiator
> driver's error handlers:
> 
> http://www.spinics.net/lists/linux-scsi/msg97064.html
> 

Propagating target LUN_RESETs into LLD driver code, specifically in
cases where the LLD has a higher host reset timeout would certainly be
useful to reduce the overall end-to-end host-reset recovery time.

That said, I don't think the various pros/cons of this approach have
ever been verbalized on the list, so I'd like to give it some more
thought to understand what the downsides (if any) are.

> For tcmu, your idea below works nicely. I will work on it. Thanks.
> 

Ok, I'll post an updated patch to make target_complete_cmd() handle
SAM_STAT_BUSY + SAM_STAT_QUEUE_FULL as a v4.12 item.

> >>
> >> We've always had the assumption that once a TMR is processed, code
> >> blocks until the backend completes outstanding se_cmd(s) in question
> >> without an explicit cancellation trigger.
> >>
> >> One approach I've taken for a out-of-tree make_request_fn() based bio
> >> driver is return bios with -EAGAIN if a bio takes more than say 3
> >> seconds to complete, in order to signal to IBLOCK to propagate up a
> >> SAM_STAT_BUSY or SAM_STAT_QUEUE_FULL.  This ends up working quite well
> >> to keep initiators happy, and virtually eliminates spurious TMRs in host
> >> environments (like ESX for example) with a very low SCSI timeout. 
> >>
> >> Of course, this pushes alot of smarts into the driver below the backend,
> >> but in a distributed scale out backend where I/Os delays, et al. are a
> >> fact of life this is difficult to avoid.
> >>
> >> So that said, I'd prefer to have TCMU's user-space code return back I/Os
> >> that are expected to take a long time with SAM_STAT_BUSY or
> >> SAM_STAT_QUEUE_FULL (to reduce host queue_depth), instead of adding a
> >> explicit se_cmd cancellation mechanism in target-core into backend
> >> driver code.
> >>
> >> I'd be happy to post the target-core changes needed to do this, which
> >> where posted at one point but ended up not getting merged. 
> > 
> > Btw, here is the patch in question:
> > 
> > http://www.spinics.net/lists/target-devel/msg11835.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[3/3] tcmu: remove cmd timeout code

Commit Message

Comments

Patch