Message ID | 20191017122549.4634-10-t-kristo@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | crypto: omap fixes towards 5.5 | expand |
On Thu, Oct 17, 2019 at 03:25:48PM +0300, Tero Kristo wrote: > Currently crypto_wait_req waits indefinitely for an async crypto request > to complete. This is bad as it can cause for example the crypto test > manager to hang without any notification as to why it has happened. > Instead of waiting indefinitely, add a 1 second timeout to the call, > and provide a warning print if a timeout happens. > > Signed-off-by: Tero Kristo <t-kristo@ti.com> > --- > include/linux/crypto.h | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/include/linux/crypto.h b/include/linux/crypto.h > index 19ea3a371d7b..b8f0e5c3cc0c 100644 > --- a/include/linux/crypto.h > +++ b/include/linux/crypto.h > @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait) > switch (err) { > case -EINPROGRESS: > case -EBUSY: > - wait_for_completion(&wait->completion); > + err = wait_for_completion_timeout(&wait->completion, > + msecs_to_jiffies(1000)); > reinit_completion(&wait->completion); > + if (!err) { > + pr_err("%s: timeout for %p\n", __func__, wait); > + err = -ETIMEDOUT; > + break; > + } > + > err = wait->err; > break; > }; I'm not sure this is a good idea, because operations could legitimately take a long time, e.g. if someone passes in a huge data buffer. How do you know that X amount of time is always going to be enough? - Eric
Hi, On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote: > > Currently crypto_wait_req waits indefinitely for an async crypto request > to complete. This is bad as it can cause for example the crypto test > manager to hang without any notification as to why it has happened. > Instead of waiting indefinitely, add a 1 second timeout to the call, > and provide a warning print if a timeout happens. While the incentive is clear and positive, this suggested solution creates problems of its own. In many (most?) cases where we are waiting here, we are waiting for a DMA operation to finish from hardware. Exiting while this pending DMA operation is not finished, even with a proper error return value, is dangerous because unless the calling code takes great care to not release the memory the DMA is being done from/to, this can have disastrous effects. As Eric has already mentioned, one second might seem like a long time, but we don't really know if it is enough. How about adding a second API (ig. crypto_wait_req_timeout) which supports a calee specified timeout where the calle knows how to correctly deal with timeout and port the relevant call sites to use this? Thanks! Gilad > > Signed-off-by: Tero Kristo <t-kristo@ti.com> > --- > include/linux/crypto.h | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/include/linux/crypto.h b/include/linux/crypto.h > index 19ea3a371d7b..b8f0e5c3cc0c 100644 > --- a/include/linux/crypto.h > +++ b/include/linux/crypto.h > @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait) > switch (err) { > case -EINPROGRESS: > case -EBUSY: > - wait_for_completion(&wait->completion); > + err = wait_for_completion_timeout(&wait->completion, > + msecs_to_jiffies(1000)); > reinit_completion(&wait->completion); > + if (!err) { > + pr_err("%s: timeout for %p\n", __func__, wait); > + err = -ETIMEDOUT; > + break; > + } > + > err = wait->err; > break; > }; > -- > 2.17.1 > > -- > Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki -- Gilad Ben-Yossef Chief Coffee Drinker values of β will give rise to dom!
On 06/11/2019 08:39, Gilad Ben-Yossef wrote: > Hi, > > > On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote: >> >> Currently crypto_wait_req waits indefinitely for an async crypto request >> to complete. This is bad as it can cause for example the crypto test >> manager to hang without any notification as to why it has happened. >> Instead of waiting indefinitely, add a 1 second timeout to the call, >> and provide a warning print if a timeout happens. > > While the incentive is clear and positive, this suggested solution > creates problems of its own. > In many (most?) cases where we are waiting here, we are waiting for a > DMA operation to finish from hardware. > Exiting while this pending DMA operation is not finished, even with a > proper error return value, is dangerous because > unless the calling code takes great care to not release the memory the > DMA is being done from/to, this can have disastrous effects. > > As Eric has already mentioned, one second might seem like a long time, > but we don't really know if it is enough. > > How about adding a second API (ig. crypto_wait_req_timeout) which > supports a calee specified timeout where > the calle knows how to correctly deal with timeout and port the > relevant call sites to use this? Yeah, that would work for me. I guess we could just swap the testmgr to use this timeout API, as it is quite clear it should timeout rather than wait indefinitely, and afaics, the data buffers it uses are limited size. It doesn't really matter for it whether the timeout is 1 second or 10 seconds, as long as it eventually times out. -Tero > > Thanks! > Gilad > > >> >> Signed-off-by: Tero Kristo <t-kristo@ti.com> >> --- >> include/linux/crypto.h | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/crypto.h b/include/linux/crypto.h >> index 19ea3a371d7b..b8f0e5c3cc0c 100644 >> --- a/include/linux/crypto.h >> +++ b/include/linux/crypto.h >> @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait) >> switch (err) { >> case -EINPROGRESS: >> case -EBUSY: >> - wait_for_completion(&wait->completion); >> + err = wait_for_completion_timeout(&wait->completion, >> + msecs_to_jiffies(1000)); >> reinit_completion(&wait->completion); >> + if (!err) { >> + pr_err("%s: timeout for %p\n", __func__, wait); >> + err = -ETIMEDOUT; >> + break; >> + } >> + >> err = wait->err; >> break; >> }; >> -- >> 2.17.1 >> >> -- > > > > -- > Gilad Ben-Yossef > Chief Coffee Drinker > > values of β will give rise to dom! > -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote: > > On 06/11/2019 08:39, Gilad Ben-Yossef wrote: > > Hi, > > > > > > On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote: > >> > >> Currently crypto_wait_req waits indefinitely for an async crypto request > >> to complete. This is bad as it can cause for example the crypto test > >> manager to hang without any notification as to why it has happened. > >> Instead of waiting indefinitely, add a 1 second timeout to the call, > >> and provide a warning print if a timeout happens. > > > > While the incentive is clear and positive, this suggested solution > > creates problems of its own. > > In many (most?) cases where we are waiting here, we are waiting for a > > DMA operation to finish from hardware. > > Exiting while this pending DMA operation is not finished, even with a > > proper error return value, is dangerous because > > unless the calling code takes great care to not release the memory the > > DMA is being done from/to, this can have disastrous effects. > > > > As Eric has already mentioned, one second might seem like a long time, > > but we don't really know if it is enough. > > > > How about adding a second API (ig. crypto_wait_req_timeout) which > > supports a calee specified timeout where > > the calle knows how to correctly deal with timeout and port the > > relevant call sites to use this? > > Yeah, that would work for me. I guess we could just swap the testmgr to > use this timeout API, as it is quite clear it should timeout rather than > wait indefinitely, and afaics, the data buffers it uses are limited > size. It doesn't really matter for it whether the timeout is 1 second or > 10 seconds, as long as it eventually times out. As long as you avoid releasing the memory used on timeout, that should work well, I think. Gilad
On Wed, Nov 06, 2019 at 09:33:20AM +0200, Gilad Ben-Yossef wrote: > On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote: > > > > On 06/11/2019 08:39, Gilad Ben-Yossef wrote: > > > Hi, > > > > > > > > > On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote: > > >> > > >> Currently crypto_wait_req waits indefinitely for an async crypto request > > >> to complete. This is bad as it can cause for example the crypto test > > >> manager to hang without any notification as to why it has happened. > > >> Instead of waiting indefinitely, add a 1 second timeout to the call, > > >> and provide a warning print if a timeout happens. > > > > > > While the incentive is clear and positive, this suggested solution > > > creates problems of its own. > > > In many (most?) cases where we are waiting here, we are waiting for a > > > DMA operation to finish from hardware. > > > Exiting while this pending DMA operation is not finished, even with a > > > proper error return value, is dangerous because > > > unless the calling code takes great care to not release the memory the > > > DMA is being done from/to, this can have disastrous effects. > > > > > > As Eric has already mentioned, one second might seem like a long time, > > > but we don't really know if it is enough. > > > > > > How about adding a second API (ig. crypto_wait_req_timeout) which > > > supports a calee specified timeout where > > > the calle knows how to correctly deal with timeout and port the > > > relevant call sites to use this? > > > > Yeah, that would work for me. I guess we could just swap the testmgr to > > use this timeout API, as it is quite clear it should timeout rather than > > wait indefinitely, and afaics, the data buffers it uses are limited > > size. It doesn't really matter for it whether the timeout is 1 second or > > 10 seconds, as long as it eventually times out. > > > As long as you avoid releasing the memory used on timeout, that should > work well, I think. > The memory is always going to be freed eventually, though. Although the crypto tests currently reuse the input/output buffers and the request structure from one test to the next, they're freed at the end of the tests. Also, it's unsafe for one request structure to be used for multiple requests concurrently anyway. I think crypto_wait_req_timeout() would just be fundamentally unsafe. Couldn't you just use CONFIG_DETECT_HUNG_TASK=y instead? It should report if any thread is blocked for too long. - Eric
On 08/11/2019 04:27, Eric Biggers wrote: > On Wed, Nov 06, 2019 at 09:33:20AM +0200, Gilad Ben-Yossef wrote: >> On Wed, Nov 6, 2019 at 9:25 AM Tero Kristo <t-kristo@ti.com> wrote: >>> >>> On 06/11/2019 08:39, Gilad Ben-Yossef wrote: >>>> Hi, >>>> >>>> >>>> On Thu, Oct 17, 2019 at 3:26 PM Tero Kristo <t-kristo@ti.com> wrote: >>>>> >>>>> Currently crypto_wait_req waits indefinitely for an async crypto request >>>>> to complete. This is bad as it can cause for example the crypto test >>>>> manager to hang without any notification as to why it has happened. >>>>> Instead of waiting indefinitely, add a 1 second timeout to the call, >>>>> and provide a warning print if a timeout happens. >>>> >>>> While the incentive is clear and positive, this suggested solution >>>> creates problems of its own. >>>> In many (most?) cases where we are waiting here, we are waiting for a >>>> DMA operation to finish from hardware. >>>> Exiting while this pending DMA operation is not finished, even with a >>>> proper error return value, is dangerous because >>>> unless the calling code takes great care to not release the memory the >>>> DMA is being done from/to, this can have disastrous effects. >>>> >>>> As Eric has already mentioned, one second might seem like a long time, >>>> but we don't really know if it is enough. >>>> >>>> How about adding a second API (ig. crypto_wait_req_timeout) which >>>> supports a calee specified timeout where >>>> the calle knows how to correctly deal with timeout and port the >>>> relevant call sites to use this? >>> >>> Yeah, that would work for me. I guess we could just swap the testmgr to >>> use this timeout API, as it is quite clear it should timeout rather than >>> wait indefinitely, and afaics, the data buffers it uses are limited >>> size. It doesn't really matter for it whether the timeout is 1 second or >>> 10 seconds, as long as it eventually times out. >> >> >> As long as you avoid releasing the memory used on timeout, that should >> work well, I think. >> > > The memory is always going to be freed eventually, though. Although the crypto > tests currently reuse the input/output buffers and the request structure from > one test to the next, they're freed at the end of the tests. Also, it's unsafe > for one request structure to be used for multiple requests concurrently anyway. > > I think crypto_wait_req_timeout() would just be fundamentally unsafe. > > Couldn't you just use CONFIG_DETECT_HUNG_TASK=y instead? It should report if > any thread is blocked for too long. The problem is not detecting a hung task, the problem is determining what caused the hang. Personally I don't care if the system dies if a crypto accelerator self test has failed, as long as I get reported about the exact nature of the failure. The failures are expected to happen only in development phase of a crypto driver. With the timeout patch in place, I get reported what exact crypto test case failed and I can focus my debug efforts on that one. Anyways, as said this is just a nice to have patch, and can be dropped no issues there. I was just thinking some other people might find it useful also. -Tero -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote: > > The problem is not detecting a hung task, the problem is determining what > caused the hang. Personally I don't care if the system dies if a crypto > accelerator self test has failed, as long as I get reported about the exact > nature of the failure. The failures are expected to happen only in > development phase of a crypto driver. > > With the timeout patch in place, I get reported what exact crypto test case > failed and I can focus my debug efforts on that one. If that's all you need then how about just making the wait killable? Cheers,
On 08/11/2019 11:16, Herbert Xu wrote: > On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote: >> >> The problem is not detecting a hung task, the problem is determining what >> caused the hang. Personally I don't care if the system dies if a crypto >> accelerator self test has failed, as long as I get reported about the exact >> nature of the failure. The failures are expected to happen only in >> development phase of a crypto driver. >> >> With the timeout patch in place, I get reported what exact crypto test case >> failed and I can focus my debug efforts on that one. > > If that's all you need then how about just making the wait killable? Yeah, that would be an alternative. -Tero -- Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
On Fri, Nov 08, 2019 at 11:22:48AM +0200, Tero Kristo wrote: > On 08/11/2019 11:16, Herbert Xu wrote: > > On Fri, Nov 08, 2019 at 09:40:57AM +0200, Tero Kristo wrote: > > > > > > The problem is not detecting a hung task, the problem is determining what > > > caused the hang. Personally I don't care if the system dies if a crypto > > > accelerator self test has failed, as long as I get reported about the exact > > > nature of the failure. The failures are expected to happen only in > > > development phase of a crypto driver. > > > > > > With the timeout patch in place, I get reported what exact crypto test case > > > failed and I can focus my debug efforts on that one. > > > > If that's all you need then how about just making the wait killable? > > Yeah, that would be an alternative. > I don't see how making crypto_wait_req killable would be any better than adding a timeout, since in both cases the crypto operation would still be proceeding in the background while things are being freed. Would it help if the crypto self-tests printed a pr_debug() message when starting each test vector? These wouldn't be shown by default, but it would be possible to enable them using dynamic-debug or by adding '#define DEBUG' to the top of the source file. - Eric
On Fri, Nov 08, 2019 at 06:27:49PM -0800, Eric Biggers wrote: > > I don't see how making crypto_wait_req killable would be any better than adding > a timeout, since in both cases the crypto operation would still be proceeding in > the background while things are being freed. Right, you would need to modify the caller to actually distinguish between the killed case vs. actual completion. > Would it help if the crypto self-tests printed a pr_debug() message when > starting each test vector? These wouldn't be shown by default, but it would be > possible to enable them using dynamic-debug or by adding '#define DEBUG' to the > top of the source file. This should be simpler to implement. Thanks,
diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 19ea3a371d7b..b8f0e5c3cc0c 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -682,8 +682,15 @@ static inline int crypto_wait_req(int err, struct crypto_wait *wait) switch (err) { case -EINPROGRESS: case -EBUSY: - wait_for_completion(&wait->completion); + err = wait_for_completion_timeout(&wait->completion, + msecs_to_jiffies(1000)); reinit_completion(&wait->completion); + if (!err) { + pr_err("%s: timeout for %p\n", __func__, wait); + err = -ETIMEDOUT; + break; + } + err = wait->err; break; };
Currently crypto_wait_req waits indefinitely for an async crypto request to complete. This is bad as it can cause for example the crypto test manager to hang without any notification as to why it has happened. Instead of waiting indefinitely, add a 1 second timeout to the call, and provide a warning print if a timeout happens. Signed-off-by: Tero Kristo <t-kristo@ti.com> --- include/linux/crypto.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)