diff mbox

[RFC,8/9] tpm_tis_spi: add delay between wait state retries

Message ID 20171208184658.9588-9-Alexander.Steffen@infineon.com (mailing list archive)
State Rejected
Headers show

Commit Message

Alexander Steffen Dec. 8, 2017, 6:46 p.m. UTC
There do not seem to be any real limits one the amount/duration of wait
states that the TPM is allowed to generate. So at least give the TPM some
time to clean up the situation that caused the wait states instead of
retrying the transfers as fast as possible.

Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
---
 drivers/char/tpm/tpm_tis_spi.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Mimi Zohar Jan. 11, 2018, 7:46 p.m. UTC | #1
Hi Alexander,

On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote:
> There do not seem to be any real limits one the amount/duration of wait
> states that the TPM is allowed to generate. So at least give the TPM some
> time to clean up the situation that caused the wait states instead of
> retrying the transfers as fast as possible.

Without this patch, the TPM performance on the pi is amazing!  A
thousand extends takes ~6.5 seconds.  Unfortunately, the same thousand
extends with this patch takes > 2+ minutes.  TPM_TIMEOUT (5 msecs) is
a really long time.  Why so long?

Mimi
  

> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
> ---
>  drivers/char/tpm/tpm_tis_spi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c
> index 2cc6aa9..e53c9c3 100644
> --- a/drivers/char/tpm/tpm_tis_spi.c
> +++ b/drivers/char/tpm/tpm_tis_spi.c
> @@ -88,7 +88,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
> 
>  		if ((phy->iobuf[3] & 0x01) == 0) {
>  			// handle SPI wait states
> -			for (i = 0; i < TPM_RETRY; i++) {
> +			for (i = 1; i <= TPM_RETRY; i++) {
>  				phy->iobuf[0] = addr;
>  				spi_xfer.len = 1;
>  				spi_message_init(&m);
> @@ -98,9 +98,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
>  					goto exit;
>  				if (phy->iobuf[0] & 0x01)
>  					break;
> +				tpm_msleep(i * TPM_TIMEOUT);
>  			}
> 
> -			if (i == TPM_RETRY) {
> +			if (i > TPM_RETRY) {
>  				spi_xfer.tx_buf = NULL;
>  				spi_xfer.rx_buf = NULL;
>  				spi_xfer.len = 0;
Alexander Steffen Jan. 12, 2018, 8:28 a.m. UTC | #2
On 11.01.2018 20:46, Mimi Zohar wrote:
> Hi Alexander,
> 
> On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote:
>> There do not seem to be any real limits one the amount/duration of wait
>> states that the TPM is allowed to generate. So at least give the TPM some
>> time to clean up the situation that caused the wait states instead of
>> retrying the transfers as fast as possible.
> 
> Without this patch, the TPM performance on the pi is amazing!  A
> thousand extends takes ~6.5 seconds.  Unfortunately, the same thousand
> extends with this patch takes > 2+ minutes.  TPM_TIMEOUT (5 msecs) is
> a really long time.  Why so long?

My problem is the lack of specification for the wait state 
functionality. How many wait states may be signaled by the TPM? For how 
long may the TPM signal wait states? Do you know any more details?

The current implementation is based on the assumption that wait states 
are the exception rather than the rule, so that longer delays are 
acceptable. And without knowing for how long wait states can happen, I 
chose rather longer delays (with this patch the TPM has several seconds 
to clear wait states) to avoid failing on some unknown TPM 
implementation in some special cases.

What would be a better implementation? No delay and simply retry for 
five seconds?

What TPMs are you using for your tests? At least for the TPMs that I 
have available for my tests (with a FIFO size of ~256 bytes) I would not 
expect any wait states for extend commands.

Alexander

> 
> Mimi
>    
> 
>> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>
>> ---
>>   drivers/char/tpm/tpm_tis_spi.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c
>> index 2cc6aa9..e53c9c3 100644
>> --- a/drivers/char/tpm/tpm_tis_spi.c
>> +++ b/drivers/char/tpm/tpm_tis_spi.c
>> @@ -88,7 +88,7 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
>>
>>   		if ((phy->iobuf[3] & 0x01) == 0) {
>>   			// handle SPI wait states
>> -			for (i = 0; i < TPM_RETRY; i++) {
>> +			for (i = 1; i <= TPM_RETRY; i++) {
>>   				phy->iobuf[0] = addr;
>>   				spi_xfer.len = 1;
>>   				spi_message_init(&m);
>> @@ -98,9 +98,10 @@ static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
>>   					goto exit;
>>   				if (phy->iobuf[0] & 0x01)
>>   					break;
>> +				tpm_msleep(i * TPM_TIMEOUT);
>>   			}
>>
>> -			if (i == TPM_RETRY) {
>> +			if (i > TPM_RETRY) {
>>   				spi_xfer.tx_buf = NULL;
>>   				spi_xfer.rx_buf = NULL;
>>   				spi_xfer.len = 0;
> 
>
Mimi Zohar Jan. 12, 2018, 2:53 p.m. UTC | #3
On Fri, 2018-01-12 at 09:28 +0100, Alexander Steffen wrote:
> On 11.01.2018 20:46, Mimi Zohar wrote:
> > Hi Alexander,
> > 
> > On Fri, 2017-12-08 at 19:46 +0100, Alexander Steffen wrote:
> >> There do not seem to be any real limits one the amount/duration of wait
> >> states that the TPM is allowed to generate. So at least give the TPM some
> >> time to clean up the situation that caused the wait states instead of
> >> retrying the transfers as fast as possible.
> > 
> > Without this patch, the TPM performance on the pi is amazing!  A
> > thousand extends takes ~6.5 seconds.  Unfortunately, the same thousand
> > extends with this patch takes > 2+ minutes.  TPM_TIMEOUT (5 msecs) is
> > a really long time.  Why so long?
> 
> My problem is the lack of specification for the wait state 
> functionality. How many wait states may be signaled by the TPM? For how 
> long may the TPM signal wait states? Do you know any more details?

First, let me apologize for not thanking you for making these changes
in the first place.  With just the burstcount patch, but without your
wait state patches, the pi doesn't boot.  I now have a test
environment and can reproduce the problem.  :)  

With your wait state patches, but without this patch, the time is ~14
seconds per thousand extends, as opposed to the ~6.5 I reported above.
 [The ~6.5 seconds is for already measured files.  Files that have
already been measured are not re-measured, so the TPM is not extended
again.  On other systems, "re-measuring" a 1000 files takes less than
1 sec.  I'm not sure why it is taking so long on the pi.]

Without any sleeps, I'm seeing that when it goes into a wait state, it
mostly loops once, every so often 3 times, but printing the counter
introduces delays and probably skews the results.  To get more
accurate results, would require aggregating the results and only
printing them after N number of extends.  I'll try to get that
information for you next week.

> The current implementation is based on the assumption that wait states 
> are the exception rather than the rule, so that longer delays are 
> acceptable. And without knowing for how long wait states can happen, I 
> chose rather longer delays (with this patch the TPM has several seconds 
> to clear wait states) to avoid failing on some unknown TPM 
> implementation in some special cases.
> 
> What would be a better implementation? No delay and simply retry for 
> five seconds?
  
Perhaps use TPM_POLL_SLEEP (1 msec), which Nayna introduced to sleep
within a max time loop.  Instead of using the number of retries, which
is currently defined as 50, define a max end time.  You could still
increment the timeout, but instead of "i * TPM_TIMEOUT) use "i *
TPM_POLL_SLEEP".  Although usleep_range is a lot better than msleep(),
there is still some overhead.  With tpm_msleep(), we don't have a way
of sleeping for less than 1 msec.  Perhaps the first time, when "i ==
0", try again without sleeping.

> What TPMs are you using for your tests? At least for the TPMs that I 
> have available for my tests (with a FIFO size of ~256 bytes) I would not 
> expect any wait states for extend commands.

I'm also surprised. The TPM on the pi isn't on a real SPI bus.  The
test is a tight loop, which forces N number of extends.  The pi is at
work, but I'm working from home today. :(  I'll have to get this info
next week.

Have you tried booting the pi with the "ima_tcb" boot command line
option?  Do you have any performance results that you could share? 

thanks,

Mimi
Mimi Zohar Jan. 15, 2018, 10:30 p.m. UTC | #4
> What would be a better implementation? No delay and simply retry for 
> five seconds?
> 
> What TPMs are you using for your tests? At least for the TPMs that I 
> have available for my tests (with a FIFO size of ~256 bytes) I would not 
> expect any wait states for extend commands.

The TPM burstcount is 32.  Unfortunately, with this patch set the
performance on the pi is worse than without it.  Without this patch
set we're seeing ~14 secs for a thousand measurements vs. either ~38s
or ~23s, depending on the wait sleep length introduced in patch 8/9.
 This testing was done on a pi with an emulated SPI bus.

On systems with a large burstcount, there is no difference, but on
systems with a small burstcount, we're seeing an improvement of ~39%
(~14s base time, ~8.5 with patches).  

Does anyone have a system with a TPM on a real SPI bus and is willing
to test this patch set?

thanks,

Mimi
Mimi Zohar Jan. 17, 2018, 5:15 p.m. UTC | #5
On Mon, 2018-01-15 at 17:30 -0500, Mimi Zohar wrote:
> > What would be a better implementation? No delay and simply retry for 
> > five seconds?
> > 
> > What TPMs are you using for your tests? At least for the TPMs that I 
> > have available for my tests (with a FIFO size of ~256 bytes) I would not 
> > expect any wait states for extend commands.
> 
> The TPM burstcount is 32.  Unfortunately, with this patch set the
> performance on the pi is worse than without it.  Without this patch
> set we're seeing ~14 secs for a thousand measurements vs. either ~38s
> or ~23s, depending on the wait sleep length introduced in patch 8/9.

From my response, it might not have been clear that with all of the
patches, except this one 8/9 calling tpm_msleep(), the performance is
equivalent to without any of the patches.  With this patch set, but
with or without the call to tpm_msleep(), it loops 2 or 3 times.

Mimi
Alexander Steffen Jan. 17, 2018, 6:58 p.m. UTC | #6
On 17.01.2018 18:15, Mimi Zohar wrote:
> On Mon, 2018-01-15 at 17:30 -0500, Mimi Zohar wrote:
>>> What would be a better implementation? No delay and simply retry for
>>> five seconds?
>>>
>>> What TPMs are you using for your tests? At least for the TPMs that I
>>> have available for my tests (with a FIFO size of ~256 bytes) I would not
>>> expect any wait states for extend commands.
>>
>> The TPM burstcount is 32.  Unfortunately, with this patch set the
>> performance on the pi is worse than without it.  Without this patch
>> set we're seeing ~14 secs for a thousand measurements vs. either ~38s
>> or ~23s, depending on the wait sleep length introduced in patch 8/9.
> 
>  From my response, it might not have been clear that with all of the
> patches, except this one 8/9 calling tpm_msleep(), the performance is
> equivalent to without any of the patches.  With this patch set, but
> with or without the call to tpm_msleep(), it loops 2 or 3 times.

Thanks for your tests. I haven't yet done such extensive performance 
tests for those changes. But I've got some performance tests that I 
still want to run to see what performance impact I can measure with my TPMs.

It seems clear to me that for optimum performance there shouldn't be any 
sleeps, at least not for the first few retries. If I use a time limit 
instead of counting the number of retries, are sleeps necessary at all? 
The SPI bus is locked anyway, so nothing else can use it while we are 
sleeping. The SPI transfers should slow us down sufficiently, so that we 
are not consuming 100% of CPU time. And in the common cases (at least 
those that we know) the wait states seem to clear up pretty quickly.

If there are no objections, I'll rewrite the patch in such a way (time 
limit of several seconds, no sleeps). Maybe it is also worth trying to 
see what performance can be gained by removing more sleeps, in 
wait_for_tpm_stat and elsewhere. Perhaps the first millisecond(s) or so 
should always be busy waiting, and sleeps should only come afterwards.

Alexander
Jarkko Sakkinen Jan. 18, 2018, 6:32 p.m. UTC | #7
On Fri, Dec 08, 2017 at 07:46:57PM +0100, Alexander Steffen wrote:
> There do not seem to be any real limits one the amount/duration of wait
> states that the TPM is allowed to generate. So at least give the TPM some
> time to clean up the situation that caused the wait states instead of
> retrying the transfers as fast as possible.
> 
> Signed-off-by: Alexander Steffen <Alexander.Steffen@infineon.com>

I agreee with Mimis comment that this change should be based on
time rather than retries.

/Jarkko
diff mbox

Patch

diff --git a/drivers/char/tpm/tpm_tis_spi.c b/drivers/char/tpm/tpm_tis_spi.c
index 2cc6aa9..e53c9c3 100644
--- a/drivers/char/tpm/tpm_tis_spi.c
+++ b/drivers/char/tpm/tpm_tis_spi.c
@@ -88,7 +88,7 @@  static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
 
 		if ((phy->iobuf[3] & 0x01) == 0) {
 			// handle SPI wait states
-			for (i = 0; i < TPM_RETRY; i++) {
+			for (i = 1; i <= TPM_RETRY; i++) {
 				phy->iobuf[0] = addr;
 				spi_xfer.len = 1;
 				spi_message_init(&m);
@@ -98,9 +98,10 @@  static int tpm_tis_spi_transfer(struct tpm_tis_data *data, u32 addr, u16 len,
 					goto exit;
 				if (phy->iobuf[0] & 0x01)
 					break;
+				tpm_msleep(i * TPM_TIMEOUT);
 			}
 
-			if (i == TPM_RETRY) {
+			if (i > TPM_RETRY) {
 				spi_xfer.tx_buf = NULL;
 				spi_xfer.rx_buf = NULL;
 				spi_xfer.len = 0;