[09/10] crypto: add timeout to crypto_wait_req

Message ID	20191017122549.4634-10-t-kristo@ti.com (mailing list archive)
State	Changes Requested
Delegated to:	Herbert Xu
Headers	show Return-Path: <SRS0=Ic7o=YK=vger.kernel.org=linux-crypto-owner@kernel.org> From: Tero Kristo <t-kristo@ti.com> To: <herbert@gondor.apana.org.au>, <davem@davemloft.net>, <linux-crypto@vger.kernel.org>, <ard.biesheuvel@linaro.org> CC: <linux-omap@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org> Subject: [PATCH 09/10] crypto: add timeout to crypto_wait_req Date: Thu, 17 Oct 2019 15:25:48 +0300 Message-ID: <20191017122549.4634-10-t-kristo@ti.com> In-Reply-To: <20191017122549.4634-1-t-kristo@ti.com> References: <20191017122549.4634-1-t-kristo@ti.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk
Series	crypto: omap fixes towards 5.5 \| expand [00/10] crypto: omap fixes towards 5.5 [01/10] crypto: omap-sham: split up data to multiple sg elements with huge data [02/10] crypto: omap-sham: remove the sysfs group during driver removal [03/10] crypto: omap-aes: remove the sysfs group during driver removal [04/10] crypto: omap-des: add IV output handling [05/10] crypto: omap-aes: add IV output handling [06/10] crypto: omap-sham: fix buffer handling for split test cases [07/10] crypto: omap-aes-gcm: fix corner case with only auth data [08/10] crypto: omap-sham: fix split update cases with cryptomgr tests [09/10] crypto: add timeout to crypto_wait_req [10/10] crypto: omap-aes: fixup aligned data cleanup

Tero Kristo Oct. 17, 2019, 12:25 p.m. UTC

Currently crypto_wait_req waits indefinitely for an async crypto request
to complete. This is bad as it can cause for example the crypto test
manager to hang without any notification as to why it has happened.
Instead of waiting indefinitely, add a 1 second timeout to the call,
and provide a warning print if a timeout happens.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 include/linux/crypto.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Eric Biggers Nov. 5, 2019, 5:42 p.m. UTC | #1

On Thu, Oct 17, 2019 at 03:25:48PM +0300, Tero Kristo wrote:
> Currently crypto_wait_req waits indefinitely for an async crypto request
> to complete. This is bad as it can cause for example the crypto test
> manager to hang without any notification as to why it has happened.
> Instead of waiting indefinitely, add a 1 second timeout to the call,
> and provide a warning print if a timeout happens.
> 
> Signed-off-by: Tero Kristo <t-kristo@ti.com>
> ---
>  include/linux/crypto.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
> index 19ea3a371d7b..b8f0e5c3cc0c 100644
> --- a/include/linux/crypto.h
> +++ b/include/linux/crypto.h
> @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait)
>  	switch (err) {
>  	case -EINPROGRESS:
>  	case -EBUSY:
> -		wait_for_completion(&wait->completion);
> +		err = wait_for_completion_timeout(&wait->completion,
> +						  msecs_to_jiffies(1000));
>  		reinit_completion(&wait->completion);
> +		if (!err) {
> +			pr_err("%s: timeout for %p\n", __func__, wait);
> +			err = -ETIMEDOUT;
> +			break;
> +		}
> +
>  		err = wait->err;
>  		break;
>  	};

I'm not sure this is a good idea, because operations could legitimately take a
long time, e.g. if someone passes in a huge data buffer.  How do you know that X
amount of time is always going to be enough?

- Eric

Gilad Ben-Yossef Nov. 6, 2019, 6:39 a.m. UTC | #2

Hi,

On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote:
>
> Currently crypto_wait_req waits indefinitely for an async crypto request
> to complete. This is bad as it can cause for example the crypto test
> manager to hang without any notification as to why it has happened.
> Instead of waiting indefinitely, add a 1 second timeout to the call,
> and provide a warning print if a timeout happens.

While the incentive is clear and positive, this suggested solution
creates problems of its own.
In many (most?) cases where we are waiting here, we are waiting for a
DMA operation to finish from hardware.
Exiting while this pending DMA operation is not finished, even with a
proper error return value, is dangerous because
unless the calling code takes great care to not release the memory the
DMA is being done from/to, this can have disastrous effects.

As Eric has already mentioned, one second might seem like a long time,
but we don't really know if it is enough.

How about adding a second API (ig. crypto_wait_req_timeout) which
supports a calee specified timeout where
the calle knows how to correctly deal with timeout and port the
relevant call sites to use this?

Thanks!
Gilad

>
> Signed-off-by: Tero Kristo <t-kristo@ti.com>
> ---
>  include/linux/crypto.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
> index 19ea3a371d7b..b8f0e5c3cc0c 100644
> --- a/include/linux/crypto.h
> +++ b/include/linux/crypto.h
> @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait)
>         switch (err) {
>         case -EINPROGRESS:
>         case -EBUSY:
> -               wait_for_completion(&wait->completion);
> +               err = wait_for_completion_timeout(&wait->completion,
> +                                                 msecs_to_jiffies(1000));
>                 reinit_completion(&wait->completion);
> +               if (!err) {
> +                       pr_err("%s: timeout for %p\n", __func__, wait);
> +                       err = -ETIMEDOUT;
> +                       break;
> +               }
> +
>                 err = wait->err;
>                 break;
>         };
> --
> 2.17.1
>
> --
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

--
Gilad Ben-Yossef
Chief Coffee Drinker

values of β will give rise to dom!

Tero Kristo Nov. 6, 2019, 7:25 a.m. UTC | #3

On 06/11/2019 08:39, Gilad Ben-Yossef wrote:
> Hi,
> 
> 
> On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote:
>>
>> Currently crypto_wait_req waits indefinitely for an async crypto request
>> to complete. This is bad as it can cause for example the crypto test
>> manager to hang without any notification as to why it has happened.
>> Instead of waiting indefinitely, add a 1 second timeout to the call,
>> and provide a warning print if a timeout happens.
> 
> While the incentive is clear and positive, this suggested solution
> creates problems of its own.
> In many (most?) cases where we are waiting here, we are waiting for a
> DMA operation to finish from hardware.
> Exiting while this pending DMA operation is not finished, even with a
> proper error return value, is dangerous because
> unless the calling code takes great care to not release the memory the
> DMA is being done from/to, this can have disastrous effects.
> 
> As Eric has already mentioned, one second might seem like a long time,
> but we don't really know if it is enough.
> 
> How about adding a second API (ig. crypto_wait_req_timeout) which
> supports a calee specified timeout where
> the calle knows how to correctly deal with timeout and port the
> relevant call sites to use this?

Yeah, that would work for me. I guess we could just swap the testmgr to 
use this timeout API, as it is quite clear it should timeout rather than 
wait indefinitely, and afaics, the data buffers it uses are limited 
size. It doesn't really matter for it whether the timeout is 1 second or 
10 seconds, as long as it eventually times out.

-Tero

> 
> Thanks!
> Gilad
> 
> 
>>
>> Signed-off-by: Tero Kristo <t-kristo@ti.com>
>> ---
>>   include/linux/crypto.h | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
>> index 19ea3a371d7b..b8f0e5c3cc0c 100644
>> --- a/include/linux/crypto.h
>> +++ b/include/linux/crypto.h
>> @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait)
>>          switch (err) {
>>          case -EINPROGRESS:
>>          case -EBUSY:
>> -               wait_for_completion(&wait->completion);
>> +               err = wait_for_completion_timeout(&wait->completion,
>> +                                                 msecs_to_jiffies(1000));
>>                  reinit_completion(&wait->completion);
>> +               if (!err) {
>> +                       pr_err("%s: timeout for %p\n", __func__, wait);
>> +                       err = -ETIMEDOUT;
>> +                       break;
>> +               }
>> +
>>                  err = wait->err;
>>                  break;
>>          };
>> --
>> 2.17.1
>>
>> --
> 
> 
> 
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
> 
> values of β will give rise to dom!
> 

--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Gilad Ben-Yossef Nov. 6, 2019, 7:33 a.m. UTC | #4

On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote:
>
> On 06/11/2019 08:39, Gilad Ben-Yossef wrote:
> > Hi,
> >
> >
> > On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote:
> >>
> >> Currently crypto_wait_req waits indefinitely for an async crypto request
> >> to complete. This is bad as it can cause for example the crypto test
> >> manager to hang without any notification as to why it has happened.
> >> Instead of waiting indefinitely, add a 1 second timeout to the call,
> >> and provide a warning print if a timeout happens.
> >
> > While the incentive is clear and positive, this suggested solution
> > creates problems of its own.
> > In many (most?) cases where we are waiting here, we are waiting for a
> > DMA operation to finish from hardware.
> > Exiting while this pending DMA operation is not finished, even with a
> > proper error return value, is dangerous because
> > unless the calling code takes great care to not release the memory the
> > DMA is being done from/to, this can have disastrous effects.
> >
> > As Eric has already mentioned, one second might seem like a long time,
> > but we don't really know if it is enough.
> >
> > How about adding a second API (ig. crypto_wait_req_timeout) which
> > supports a calee specified timeout where
> > the calle knows how to correctly deal with timeout and port the
> > relevant call sites to use this?
>
> Yeah, that would work for me. I guess we could just swap the testmgr to
> use this timeout API, as it is quite clear it should timeout rather than
> wait indefinitely, and afaics, the data buffers it uses are limited
> size. It doesn't really matter for it whether the timeout is 1 second or
> 10 seconds, as long as it eventually times out.


As long as you avoid releasing the memory used on timeout, that should
work well, I think.

Gilad

Eric Biggers Nov. 8, 2019, 2:27 a.m. UTC | #5

On Wed, Nov 06, 2019 at 09:33:20AM +0200, Gilad Ben-Yossef wrote:
> On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote:
> >
> > On 06/11/2019 08:39, Gilad Ben-Yossef wrote:
> > > Hi,
> > >
> > >
> > > On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote:
> > >>
> > >> Currently crypto_wait_req waits indefinitely for an async crypto request
> > >> to complete. This is bad as it can cause for example the crypto test
> > >> manager to hang without any notification as to why it has happened.
> > >> Instead of waiting indefinitely, add a 1 second timeout to the call,
> > >> and provide a warning print if a timeout happens.
> > >
> > > While the incentive is clear and positive, this suggested solution
> > > creates problems of its own.
> > > In many (most?) cases where we are waiting here, we are waiting for a
> > > DMA operation to finish from hardware.
> > > Exiting while this pending DMA operation is not finished, even with a
> > > proper error return value, is dangerous because
> > > unless the calling code takes great care to not release the memory the
> > > DMA is being done from/to, this can have disastrous effects.
> > >
> > > As Eric has already mentioned, one second might seem like a long time,
> > > but we don't really know if it is enough.
> > >
> > > How about adding a second API (ig. crypto_wait_req_timeout) which
> > > supports a calee specified timeout where
> > > the calle knows how to correctly deal with timeout and port the
> > > relevant call sites to use this?
> >
> > Yeah, that would work for me. I guess we could just swap the testmgr to
> > use this timeout API, as it is quite clear it should timeout rather than
> > wait indefinitely, and afaics, the data buffers it uses are limited
> > size. It doesn't really matter for it whether the timeout is 1 second or
> > 10 seconds, as long as it eventually times out.
> 
> 
> As long as you avoid releasing the memory used on timeout, that should
> work well, I think.
> 

The memory is always going to be freed eventually, though.  Although the crypto
tests currently reuse the input/output buffers and the request structure from
one test to the next, they're freed at the end of the tests.  Also, it's unsafe
for one request structure to be used for multiple requests concurrently anyway.

I think crypto_wait_req_timeout() would just be fundamentally unsafe.

Couldn't you just use CONFIG_DETECT_HUNG_TASK=y instead?  It should report if
any thread is blocked for too long.

- Eric

Tero Kristo Nov. 8, 2019, 7:40 a.m. UTC | #6

On 08/11/2019 04:27, Eric Biggers wrote:
> On Wed, Nov 06, 2019 at 09:33:20AM +0200, Gilad Ben-Yossef wrote:
>> On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote:
>>>
>>> On 06/11/2019 08:39, Gilad Ben-Yossef wrote:
>>>> Hi,
>>>>
>>>>
>>>> On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote:
>>>>>
>>>>> Currently crypto_wait_req waits indefinitely for an async crypto request
>>>>> to complete. This is bad as it can cause for example the crypto test
>>>>> manager to hang without any notification as to why it has happened.
>>>>> Instead of waiting indefinitely, add a 1 second timeout to the call,
>>>>> and provide a warning print if a timeout happens.
>>>>
>>>> While the incentive is clear and positive, this suggested solution
>>>> creates problems of its own.
>>>> In many (most?) cases where we are waiting here, we are waiting for a
>>>> DMA operation to finish from hardware.
>>>> Exiting while this pending DMA operation is not finished, even with a
>>>> proper error return value, is dangerous because
>>>> unless the calling code takes great care to not release the memory the
>>>> DMA is being done from/to, this can have disastrous effects.
>>>>
>>>> As Eric has already mentioned, one second might seem like a long time,
>>>> but we don't really know if it is enough.
>>>>
>>>> How about adding a second API (ig. crypto_wait_req_timeout) which
>>>> supports a calee specified timeout where
>>>> the calle knows how to correctly deal with timeout and port the
>>>> relevant call sites to use this?
>>>
>>> Yeah, that would work for me. I guess we could just swap the testmgr to
>>> use this timeout API, as it is quite clear it should timeout rather than
>>> wait indefinitely, and afaics, the data buffers it uses are limited
>>> size. It doesn't really matter for it whether the timeout is 1 second or
>>> 10 seconds, as long as it eventually times out.
>>
>>
>> As long as you avoid releasing the memory used on timeout, that should
>> work well, I think.
>>
> 
> The memory is always going to be freed eventually, though.  Although the crypto
> tests currently reuse the input/output buffers and the request structure from
> one test to the next, they're freed at the end of the tests.  Also, it's unsafe
> for one request structure to be used for multiple requests concurrently anyway.
> 
> I think crypto_wait_req_timeout() would just be fundamentally unsafe.
> 
> Couldn't you just use CONFIG_DETECT_HUNG_TASK=y instead?  It should report if
> any thread is blocked for too long.

The problem is not detecting a hung task, the problem is determining 
what caused the hang. Personally I don't care if the system dies if a 
crypto accelerator self test has failed, as long as I get reported about 
the exact nature of the failure. The failures are expected to happen 
only in development phase of a crypto driver.

With the timeout patch in place, I get reported what exact crypto test 
case failed and I can focus my debug efforts on that one.

Anyways, as said this is just a nice to have patch, and can be dropped 
no issues there. I was just thinking some other people might find it 
useful also.

-Tero
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Herbert Xu Nov. 8, 2019, 9:16 a.m. UTC | #7

On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote:
>
> The problem is not detecting a hung task, the problem is determining what
> caused the hang. Personally I don't care if the system dies if a crypto
> accelerator self test has failed, as long as I get reported about the exact
> nature of the failure. The failures are expected to happen only in
> development phase of a crypto driver.
> 
> With the timeout patch in place, I get reported what exact crypto test case
> failed and I can focus my debug efforts on that one.

If that's all you need then how about just making the wait killable?

Cheers,

Tero Kristo Nov. 8, 2019, 9:22 a.m. UTC | #8

On 08/11/2019 11:16, Herbert Xu wrote:
> On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote:
>>
>> The problem is not detecting a hung task, the problem is determining what
>> caused the hang. Personally I don't care if the system dies if a crypto
>> accelerator self test has failed, as long as I get reported about the exact
>> nature of the failure. The failures are expected to happen only in
>> development phase of a crypto driver.
>>
>> With the timeout patch in place, I get reported what exact crypto test case
>> failed and I can focus my debug efforts on that one.
> 
> If that's all you need then how about just making the wait killable?

Yeah, that would be an alternative.

-Tero
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Eric Biggers Nov. 9, 2019, 2:27 a.m. UTC | #9

On Fri, Nov 08, 2019 at 11:22:48AM +0200, Tero Kristo wrote:
> On 08/11/2019 11:16, Herbert Xu wrote:
> > On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote:
> > > 
> > > The problem is not detecting a hung task, the problem is determining what
> > > caused the hang. Personally I don't care if the system dies if a crypto
> > > accelerator self test has failed, as long as I get reported about the exact
> > > nature of the failure. The failures are expected to happen only in
> > > development phase of a crypto driver.
> > > 
> > > With the timeout patch in place, I get reported what exact crypto test case
> > > failed and I can focus my debug efforts on that one.
> > 
> > If that's all you need then how about just making the wait killable?
> 
> Yeah, that would be an alternative.
> 

I don't see how making crypto_wait_req killable would be any better than adding
a timeout, since in both cases the crypto operation would still be proceeding in
the background while things are being freed.

Would it help if the crypto self-tests printed a pr_debug() message when
starting each test vector?  These wouldn't be shown by default, but it would be
possible to enable them using dynamic-debug or by adding '#define DEBUG' to the
top of the source file.

- Eric

Herbert Xu Nov. 9, 2019, 5:01 a.m. UTC | #10

On Fri, Nov 08, 2019 at 06:27:49PM -0800, Eric Biggers wrote:
> 
> I don't see how making crypto_wait_req killable would be any better than adding
> a timeout, since in both cases the crypto operation would still be proceeding in
> the background while things are being freed.

Right, you would need to modify the caller to actually distinguish
between the killed case vs. actual completion.

> Would it help if the crypto self-tests printed a pr_debug() message when
> starting each test vector?  These wouldn't be shown by default, but it would be
> possible to enable them using dynamic-debug or by adding '#define DEBUG' to the
> top of the source file.

This should be simpler to implement.

Thanks,

[09/10] crypto: add timeout to crypto_wait_req

Commit Message

Comments

Patch