[1/4] tpm_tis: Clean up locality release

Message ID	20200929223216.22584-2-James.Bottomley@HansenPartnership.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=nt+s=DG=vger.kernel.org=linux-integrity-owner@kernel.org> From: James Bottomley <James.Bottomley@HansenPartnership.com> To: linux-integrity@vger.kernel.org Cc: Jason Gunthorpe <jgg@ziepe.ca>, Jerry Snitselaar <jsnitsel@redhat.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>, Peter Huewe <peterhuewe@gmx.de> Subject: [PATCH 1/4] tpm_tis: Clean up locality release Date: Tue, 29 Sep 2020 15:32:13 -0700 Message-Id: <20200929223216.22584-2-James.Bottomley@HansenPartnership.com> In-Reply-To: <20200929223216.22584-1-James.Bottomley@HansenPartnership.com> References: <20200929223216.22584-1-James.Bottomley@HansenPartnership.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	tpm_tis: fix interrupts (again) \| expand [0/4] tpm_tis: fix interrupts (again) [1/4] tpm_tis: Clean up locality release [2/4] tpm_tis: Fix interrupts for TIS TPMs without legacy cycles [3/4] tpm_tis: fix IRQ probing [4/4] Revert "tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's""

James Bottomley Sept. 29, 2020, 10:32 p.m. UTC

The current release locality code seems to be based on the
misunderstanding that the TPM interrupts when a locality is released:
it doesn't, only when the locality is acquired.

Furthermore, there seems to be no point in waiting for the locality to
be released.  All it does is penalize the last TPM user.  However, if
there's no next TPM user, this is a pointless wait and if there is a
next TPM user, they'll pay the penalty waiting for the new locality
(or possibly not if it's the same as the old locality).

Fix the code by making release_locality as simple write to release
with no waiting for completion.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 drivers/char/tpm/tpm_tis_core.c | 47 +--------------------------------
 1 file changed, 1 insertion(+), 46 deletions(-)

Jarkko Sakkinen Sept. 30, 2020, 2:26 a.m. UTC | #1

On Tue, Sep 29, 2020 at 03:32:13PM -0700, James Bottomley wrote:
> The current release locality code seems to be based on the
> misunderstanding that the TPM interrupts when a locality is released:
> it doesn't, only when the locality is acquired.
> 
> Furthermore, there seems to be no point in waiting for the locality to
> be released.  All it does is penalize the last TPM user.  However, if
> there's no next TPM user, this is a pointless wait and if there is a
> next TPM user, they'll pay the penalty waiting for the new locality
> (or possibly not if it's the same as the old locality).
> 
> Fix the code by making release_locality as simple write to release
> with no waiting for completion.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

Adding Jerry for feedback.

Probably should have (if accepted).

Fixes: 33bafe90824b ("tpm_tis: verify locality released before returning from release_locality")

> ---
>  drivers/char/tpm/tpm_tis_core.c | 47 +--------------------------------
>  1 file changed, 1 insertion(+), 46 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 92c51c6cfd1b..a9fa40714c64 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -134,58 +134,13 @@ static bool check_locality(struct tpm_chip *chip, int l)
>  	return false;
>  }
>  
> -static bool locality_inactive(struct tpm_chip *chip, int l)
> -{
> -	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> -	int rc;
> -	u8 access;
> -
> -	rc = tpm_tis_read8(priv, TPM_ACCESS(l), &access);
> -	if (rc < 0)
> -		return false;
> -
> -	if ((access & (TPM_ACCESS_VALID | TPM_ACCESS_ACTIVE_LOCALITY))
> -	    == TPM_ACCESS_VALID)
> -		return true;
> -
> -	return false;
> -}
> -
>  static int release_locality(struct tpm_chip *chip, int l)

Should be void.

>  {
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> -	unsigned long stop, timeout;
> -	long rc;
>  
>  	tpm_tis_write8(priv, TPM_ACCESS(l), TPM_ACCESS_ACTIVE_LOCALITY);
>  
> -	stop = jiffies + chip->timeout_a;
> -
> -	if (chip->flags & TPM_CHIP_FLAG_IRQ) {
> -again:
> -		timeout = stop - jiffies;
> -		if ((long)timeout <= 0)
> -			return -1;
> -
> -		rc = wait_event_interruptible_timeout(priv->int_queue,
> -						      (locality_inactive(chip, l)),
> -						      timeout);
> -
> -		if (rc > 0)
> -			return 0;
> -
> -		if (rc == -ERESTARTSYS && freezing(current)) {
> -			clear_thread_flag(TIF_SIGPENDING);
> -			goto again;
> -		}
> -	} else {
> -		do {
> -			if (locality_inactive(chip, l))
> -				return 0;
> -			tpm_msleep(TPM_TIMEOUT);
> -		} while (time_before(jiffies, stop));
> -	}
> -	return -1;
> +	return 0;
>  }
>  
>  static int request_locality(struct tpm_chip *chip, int l)
> -- 
> 2.28.0
> 

/Jarkko

Jarkko Sakkinen Sept. 30, 2020, 2:26 a.m. UTC | #2

On Wed, Sep 30, 2020 at 05:26:10AM +0300, Jarkko Sakkinen wrote:
> Adding Jerry for feedback.

Ugh, sorry, Jerry was already in the CC list.

/Jarkko

Jerry Snitselaar Sept. 30, 2020, 9:19 p.m. UTC | #3

James Bottomley @ 2020-09-29 15:32 MST:

> The current release locality code seems to be based on the
> misunderstanding that the TPM interrupts when a locality is released:
> it doesn't, only when the locality is acquired.
>
> Furthermore, there seems to be no point in waiting for the locality to
> be released.  All it does is penalize the last TPM user.  However, if
> there's no next TPM user, this is a pointless wait and if there is a
> next TPM user, they'll pay the penalty waiting for the new locality
> (or possibly not if it's the same as the old locality).
>
> Fix the code by making release_locality as simple write to release
> with no waiting for completion.
>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> ---
>  drivers/char/tpm/tpm_tis_core.c | 47 +--------------------------------
>  1 file changed, 1 insertion(+), 46 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 92c51c6cfd1b..a9fa40714c64 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -134,58 +134,13 @@ static bool check_locality(struct tpm_chip *chip, int l)
>  	return false;
>  }
>  
> -static bool locality_inactive(struct tpm_chip *chip, int l)
> -{
> -	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> -	int rc;
> -	u8 access;
> -
> -	rc = tpm_tis_read8(priv, TPM_ACCESS(l), &access);
> -	if (rc < 0)
> -		return false;
> -
> -	if ((access & (TPM_ACCESS_VALID | TPM_ACCESS_ACTIVE_LOCALITY))
> -	    == TPM_ACCESS_VALID)
> -		return true;
> -
> -	return false;
> -}
> -
>  static int release_locality(struct tpm_chip *chip, int l)
>  {
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> -	unsigned long stop, timeout;
> -	long rc;
>  
>  	tpm_tis_write8(priv, TPM_ACCESS(l), TPM_ACCESS_ACTIVE_LOCALITY);
>  
> -	stop = jiffies + chip->timeout_a;
> -
> -	if (chip->flags & TPM_CHIP_FLAG_IRQ) {
> -again:
> -		timeout = stop - jiffies;
> -		if ((long)timeout <= 0)
> -			return -1;
> -
> -		rc = wait_event_interruptible_timeout(priv->int_queue,
> -						      (locality_inactive(chip, l)),
> -						      timeout);
> -
> -		if (rc > 0)
> -			return 0;
> -
> -		if (rc == -ERESTARTSYS && freezing(current)) {
> -			clear_thread_flag(TIF_SIGPENDING);
> -			goto again;
> -		}
> -	} else {
> -		do {
> -			if (locality_inactive(chip, l))
> -				return 0;
> -			tpm_msleep(TPM_TIMEOUT);
> -		} while (time_before(jiffies, stop));
> -	}
> -	return -1;
> +	return 0;
>  }
>  
>  static int request_locality(struct tpm_chip *chip, int l)

My recollection is that this was added because there were some chips
that took so long to release locality that a subsequent request_locality
call was seeing the locality as already active, moving on, and then the
locality was getting released out from under the user.

James Bottomley Sept. 30, 2020, 11:03 p.m. UTC | #4

On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
> James Bottomley @ 2020-09-29 15:32 MST:
> 
> > The current release locality code seems to be based on the
> > misunderstanding that the TPM interrupts when a locality is
> > released: it doesn't, only when the locality is acquired.
> > 
> > Furthermore, there seems to be no point in waiting for the locality
> > to be released.  All it does is penalize the last TPM
> > user.  However, if there's no next TPM user, this is a pointless
> > wait and if there is
> > a
> > next TPM user, they'll pay the penalty waiting for the new locality
> > (or possibly not if it's the same as the old locality).
> > 
> > Fix the code by making release_locality as simple write to release
> > with no waiting for completion.
[...]
> My recollection is that this was added because there were some chips
> that took so long to release locality that a subsequent
> request_locality call was seeing the locality as already active,
> moving on, and then the locality was getting released out from under
> the user.

Well, I could simply dump the interrupt code, which can never work and
we could always poll.

However, there also appears to be a bug in our locality requesting
code.  We write the request and wait for the grant, but a grant should
be signalled by not only the ACCESS_ACTIVE_LOCALITY being 1 but also
the ACCESS_REQUEST_USE going to 0.  As you say, if we're slow to
relinquish, ACCESS_ACTIVE_LOCALITY could already be 1 and we'd think we
were granted, but ACCESS_REQUEST_USE should stay 1 until the TPM
actually grants the next request.

If I code up a fix is there any chance you still have access to a
problem TPM?  Mine all seem to grant and release localities fairly
instantaneously.

James

Jerry Snitselaar Oct. 1, 2020, 12:01 a.m. UTC | #5

James Bottomley @ 2020-09-30 16:03 MST:

> On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
>> James Bottomley @ 2020-09-29 15:32 MST:
>> 
>> > The current release locality code seems to be based on the
>> > misunderstanding that the TPM interrupts when a locality is
>> > released: it doesn't, only when the locality is acquired.
>> > 
>> > Furthermore, there seems to be no point in waiting for the locality
>> > to be released.  All it does is penalize the last TPM
>> > user.  However, if there's no next TPM user, this is a pointless
>> > wait and if there is
>> > a
>> > next TPM user, they'll pay the penalty waiting for the new locality
>> > (or possibly not if it's the same as the old locality).
>> > 
>> > Fix the code by making release_locality as simple write to release
>> > with no waiting for completion.
> [...]
>> My recollection is that this was added because there were some chips
>> that took so long to release locality that a subsequent
>> request_locality call was seeing the locality as already active,
>> moving on, and then the locality was getting released out from under
>> the user.
>
> Well, I could simply dump the interrupt code, which can never work and
> we could always poll.
>
> However, there also appears to be a bug in our locality requesting
> code.  We write the request and wait for the grant, but a grant should
> be signalled by not only the ACCESS_ACTIVE_LOCALITY being 1 but also
> the ACCESS_REQUEST_USE going to 0.  As you say, if we're slow to
> relinquish, ACCESS_ACTIVE_LOCALITY could already be 1 and we'd think we
> were granted, but ACCESS_REQUEST_USE should stay 1 until the TPM
> actually grants the next request.
>
> If I code up a fix is there any chance you still have access to a
> problem TPM?  Mine all seem to grant and release localities fairly
> instantaneously.
>
> James

Sorry, I seemed to make a mess of it. I don't have access to a system where it
occurred, but cc'ing Laurent since he reported the problem and might
still have access to the system.

I'd say fix up the check for locality request to look at
ACCESS_REQUEST_USE, and go with this patch to clean up locality release.
Hopefully Laurent still has access and can test. I do have a laptop now
where I should be able to test the other bits in your patchset since
this is one of the models that hit interrupt storm problem when Stefan's
2 patches were originally applied. Lenovo applied a fix to their bios,
but this should still have the older one version that has the issue. I'm
on PTO this week, but I will try to spend some time in the next couple
days reproducing and then trying your patches.

Regards,
Jerry

Jarkko Sakkinen Oct. 1, 2020, 2:01 a.m. UTC | #6

On Wed, Sep 30, 2020 at 04:03:25PM -0700, James Bottomley wrote:
> On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
> > James Bottomley @ 2020-09-29 15:32 MST:
> > 
> > > The current release locality code seems to be based on the
> > > misunderstanding that the TPM interrupts when a locality is
> > > released: it doesn't, only when the locality is acquired.
> > > 
> > > Furthermore, there seems to be no point in waiting for the locality
> > > to be released.  All it does is penalize the last TPM
> > > user.  However, if there's no next TPM user, this is a pointless
> > > wait and if there is
> > > a
> > > next TPM user, they'll pay the penalty waiting for the new locality
> > > (or possibly not if it's the same as the old locality).
> > > 
> > > Fix the code by making release_locality as simple write to release
> > > with no waiting for completion.
> [...]
> > My recollection is that this was added because there were some chips
> > that took so long to release locality that a subsequent
> > request_locality call was seeing the locality as already active,
> > moving on, and then the locality was getting released out from under
> > the user.
> 
> Well, I could simply dump the interrupt code, which can never work and
> we could always poll.

Side-topic: What is the benefit of using int's in a TPM driver anyway? I
have never had any interest to dive into this with tpm_crb because I
don't have the answer.

*Perhaps* in some smallest form factor battery run devices you could get
some gain in run-time power saving but usually in such situations you
use something similar to TEE to do a measured boot.

/Jarkko

James Bottomley Oct. 1, 2020, 4:49 a.m. UTC | #7

On Thu, 2020-10-01 at 05:01 +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 04:03:25PM -0700, James Bottomley wrote:
> > On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
> > > James Bottomley @ 2020-09-29 15:32 MST:
> > > 
> > > > The current release locality code seems to be based on the
> > > > misunderstanding that the TPM interrupts when a locality is
> > > > released: it doesn't, only when the locality is acquired.
> > > > 
> > > > Furthermore, there seems to be no point in waiting for the
> > > > locality to be released.  All it does is penalize the last TPM
> > > > user.  However, if there's no next TPM user, this is a
> > > > pointless wait and if there is a next TPM user, they'll pay the
> > > > penalty waiting for the new locality (or possibly not if it's
> > > > the same as the old locality).
> > > > 
> > > > Fix the code by making release_locality as simple write to
> > > > release with no waiting for completion.
> > [...]
> > > My recollection is that this was added because there were some
> > > chips that took so long to release locality that a subsequent
> > > request_locality call was seeing the locality as already active,
> > > moving on, and then the locality was getting released out from
> > > under the user.
> > 
> > Well, I could simply dump the interrupt code, which can never work
> > and we could always poll.
> 
> Side-topic: What is the benefit of using int's in a TPM driver
> anyway? I have never had any interest to dive into this with tpm_crb
> because I don't have the answer.

polling for events that don't immediately happen is a huge waste of
time.  That's why interrupts were invented in the first place.  If you
poll too fast, you consume wakeups which are really expensive to idle
time and if you poll too slowly you wait too long and your throughput
really tanks.  For stuff like disk and network transfers interrupts are
basically essential.  For less high volume stuff, like the TPM, we can
get away with polling, but it's hugely suboptimal if you have a large
number of events to get through ... like updating the IMA log.

> *Perhaps* in some smallest form factor battery run devices you could
> get some gain in run-time power saving but usually in such situations
> you use something similar to TEE to do a measured boot.

It's not about power saving, it's about doing stuff at the right time.

James

James Bottomley Oct. 1, 2020, 3:58 p.m. UTC | #8

On Wed, 2020-09-30 at 17:01 -0700, Jerry Snitselaar wrote:
> James Bottomley @ 2020-09-30 16:03 MST:
> 
> > On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
> > > James Bottomley @ 2020-09-29 15:32 MST:
> > > 
> > > > The current release locality code seems to be based on the
> > > > misunderstanding that the TPM interrupts when a locality is
> > > > released: it doesn't, only when the locality is acquired.
> > > > 
> > > > Furthermore, there seems to be no point in waiting for the
> > > > locality to be released.  All it does is penalize the last TPM
> > > > user.  However, if there's no next TPM user, this is a
> > > > pointless wait and if there is a next TPM user, they'll pay the
> > > > penalty waiting for the new locality (or possibly not if it's
> > > > the same as the old locality).
> > > > 
> > > > Fix the code by making release_locality as simple write to
> > > > release with no waiting for completion.
> > [...]
> > > My recollection is that this was added because there were some
> > > chips that took so long to release locality that a subsequent
> > > request_locality call was seeing the locality as already active,
> > > moving on, and then the locality was getting released out from
> > > under the user.
> > 
> > Well, I could simply dump the interrupt code, which can never work
> > and we could always poll.
> > 
> > However, there also appears to be a bug in our locality requesting
> > code.  We write the request and wait for the grant, but a grant
> > should be signalled by not only the ACCESS_ACTIVE_LOCALITY being 1
> > but also the ACCESS_REQUEST_USE going to 0.  As you say, if we're
> > slow to relinquish, ACCESS_ACTIVE_LOCALITY could already be 1 and
> > we'd think we were granted, but ACCESS_REQUEST_USE should stay 1
> > until the TPM actually grants the next request.
> > 
> > If I code up a fix is there any chance you still have access to a
> > problem TPM?  Mine all seem to grant and release localities fairly
> > instantaneously.
> > 
> > James
> 
> Sorry, I seemed to make a mess of it. I don't have access to a system
> where it occurred, but cc'ing Laurent since he reported the problem
> and might still have access to the system.
> 
> I'd say fix up the check for locality request to look at
> ACCESS_REQUEST_USE, and go with this patch to clean up locality
> release. Hopefully Laurent still has access and can test. I do have a
> laptop now where I should be able to test the other bits in your
> patchset since this is one of the models that hit interrupt storm
> problem when Stefan's 2 patches were originally applied. Lenovo
> applied a fix to their bios, but this should still have the older one
> version that has the issue. I'm on PTO this week, but I will try to
> spend some time in the next couple days reproducing and then trying
> your patches.

Thanks.  I think the patch to fix to request access is very simple ...
it's just to check the request bit has gone to zero, so I've attached
it below.  It seems to work fine for me.

James

---

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 0a86cf392466..5e56e8c67791 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -168,7 +168,8 @@ static bool check_locality(struct tpm_chip *chip, int l)
 	if (rc < 0)
 		return false;
 
-	if ((access & (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID)) ==
+	if ((access & (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID
+		       | TPM_ACCESS_REQUEST_USE)) ==
 	    (TPM_ACCESS_ACTIVE_LOCALITY | TPM_ACCESS_VALID)) {
 		priv->locality = l;
 		return true;

James Bottomley Oct. 1, 2020, 5:48 p.m. UTC | #9

On Wed, 2020-09-30 at 21:49 -0700, James Bottomley wrote:
> On Thu, 2020-10-01 at 05:01 +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 04:03:25PM -0700, James Bottomley wrote:
> > > On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
> > > > James Bottomley @ 2020-09-29 15:32 MST:
> > > > 
> > > > > The current release locality code seems to be based on the
> > > > > misunderstanding that the TPM interrupts when a locality is
> > > > > released: it doesn't, only when the locality is acquired.
> > > > > 
> > > > > Furthermore, there seems to be no point in waiting for the
> > > > > locality to be released.  All it does is penalize the last
> > > > > TPM user.  However, if there's no next TPM user, this is a
> > > > > pointless wait and if there is a next TPM user, they'll pay
> > > > > the penalty waiting for the new locality (or possibly not if
> > > > > it's the same as the old locality).
> > > > > 
> > > > > Fix the code by making release_locality as simple write to
> > > > > release with no waiting for completion.
> > > [...]
> > > > My recollection is that this was added because there were some
> > > > chips that took so long to release locality that a subsequent
> > > > request_locality call was seeing the locality as already
> > > > active, moving on, and then the locality was getting released
> > > > out from under the user.
> > > 
> > > Well, I could simply dump the interrupt code, which can never
> > > work and we could always poll.
> > 
> > Side-topic: What is the benefit of using int's in a TPM driver
> > anyway? I have never had any interest to dive into this with
> > tpm_crb because I don't have the answer.
> 
> polling for events that don't immediately happen is a huge waste of
> time.  That's why interrupts were invented in the first place.  If
> you poll too fast, you consume wakeups which are really expensive to
> idle time and if you poll too slowly you wait too long and your
> throughput really tanks.  For stuff like disk and network transfers
> interrupts are basically essential.  For less high volume stuff, like
> the TPM, we can get away with polling, but it's hugely suboptimal if
> you have a large number of events to get through ... like updating
> the IMA log.

I suppose I should also add that for annoying TPMs that crash if you
poll too often, like the Atmel and my Nuvoton, using interrupts would
be a huge facilitator because you only touch the status register when
you know something has changed and the TPM is expecting you to check.  
Not that this will actually help me: my ACPI tables imply my TPM has no
interrupt line, unfortunately.

James

Laurent Bigonville Jan. 2, 2021, 1:17 a.m. UTC | #10

Le 1/10/20 à 17:58, James Bottomley a écrit :
Hello,
> On Wed, 2020-09-30 at 17:01 -0700, Jerry Snitselaar wrote:
>> James Bottomley @ 2020-09-30 16:03 MST:
>>
>>> On Wed, 2020-09-30 at 14:19 -0700, Jerry Snitselaar wrote:
>>>> James Bottomley @ 2020-09-29 15:32 MST:
>>>>
>>>>> The current release locality code seems to be based on the
>>>>> misunderstanding that the TPM interrupts when a locality is
>>>>> released: it doesn't, only when the locality is acquired.
>>>>>
>>>>> Furthermore, there seems to be no point in waiting for the
>>>>> locality to be released.  All it does is penalize the last TPM
>>>>> user.  However, if there's no next TPM user, this is a
>>>>> pointless wait and if there is a next TPM user, they'll pay the
>>>>> penalty waiting for the new locality (or possibly not if it's
>>>>> the same as the old locality).
>>>>>
>>>>> Fix the code by making release_locality as simple write to
>>>>> release with no waiting for completion.
>>> [...]
>>>> My recollection is that this was added because there were some
>>>> chips that took so long to release locality that a subsequent
>>>> request_locality call was seeing the locality as already active,
>>>> moving on, and then the locality was getting released out from
>>>> under the user.
>>> Well, I could simply dump the interrupt code, which can never work
>>> and we could always poll.
>>>
>>> However, there also appears to be a bug in our locality requesting
>>> code.  We write the request and wait for the grant, but a grant
>>> should be signalled by not only the ACCESS_ACTIVE_LOCALITY being 1
>>> but also the ACCESS_REQUEST_USE going to 0.  As you say, if we're
>>> slow to relinquish, ACCESS_ACTIVE_LOCALITY could already be 1 and
>>> we'd think we were granted, but ACCESS_REQUEST_USE should stay 1
>>> until the TPM actually grants the next request.
>>>
>>> If I code up a fix is there any chance you still have access to a
>>> problem TPM?  Mine all seem to grant and release localities fairly
>>> instantaneously.
>>>
>>> James
>> Sorry, I seemed to make a mess of it. I don't have access to a system
>> where it occurred, but cc'ing Laurent since he reported the problem
>> and might still have access to the system.
>>
>> I'd say fix up the check for locality request to look at
>> ACCESS_REQUEST_USE, and go with this patch to clean up locality
>> release. Hopefully Laurent still has access and can test. I do have a
>> laptop now where I should be able to test the other bits in your
>> patchset since this is one of the models that hit interrupt storm
>> problem when Stefan's 2 patches were originally applied. Lenovo
>> applied a fix to their bios, but this should still have the older one
>> version that has the issue. I'm on PTO this week, but I will try to
>> spend some time in the next couple days reproducing and then trying
>> your patches.
> Thanks.  I think the patch to fix to request access is very simple ...
> it's just to check the request bit has gone to zero, so I've attached
> it below.  It seems to work fine for me.
>
Sorry for the (really) late answer. I still do have access to the same 
system. Do you still need something from me?

But I do have two issues with the tpm chip on that system (probably not 
related to the discussion you were having here) so I'm not sure I will 
be able to easily test that everything is working:

1) The machine is in dualboot with windows 10 and for some reasons, 
every time I'm rebooting between linux and windows the chip is locking 
itself. AFAICS, when rebooting windows multiple time it's not happening. 
And the grace period is around 36h...

2) I just updated to 5.10 today (debian updated the kernel in unstable) 
and I get a WARNING when the tpm_tis module is being loaded:

kernel: ------------[ cut here ]------------
kernel: TPM returned invalid status
kernel: WARNING: CPU: 3 PID: 443 at drivers/char/tpm/tpm_tis_core.c:249 tpm_tis_status+0x86/0xa0 [tpm_tis_core]
kernel: Modules linked in: tpm_tis(+) tpm_tis_core tpm asus_atk0110 rng_core evdev loop(+) firewire_sbp2 msr parport_pc sunrpc ppdev lp parport fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 >
kernel: CPU: 3 PID: 443 Comm: systemd-udevd Tainted: G          I       5.10.0-1-amd64 #1 Debian 5.10.4-1
kernel: Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS 0406    04/24/2009
kernel: RIP: 0010:tpm_tis_status+0x86/0xa0 [tpm_tis_core]
kernel: Code: 00 75 30 48 83 c4 18 c3 31 c0 80 3d e3 48 00 00 00 75 e0 48 c7 c7 4c 83 18 c1 88 44 24 07 c6 05 cf 48 00 00 01 e8 9a 57 ce fc <0f> 0b 0f b6 44 24 07 eb c0 e8 bc ca d1 fc 66 66 2e 0f 1f 84 00 00
kernel: RSP: 0018:ffffb98b0076faa0 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff8dc1892a1000 RCX: ffff8dc52dad8a08
kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8dc52dad8a00
kernel: RBP: 00000000ffff5d8a R08: 0000000000000000 R09: ffffb98b0076f8c0
kernel: R10: ffffb98b0076f8b8 R11: ffffffffbe8cb268 R12: 0000000000000016
kernel: R13: ffff8dc183bec000 R14: 0000000000001000 R15: ffffb98b0076fada
kernel: FS:  00007f6916f658c0(0000) GS:ffff8dc52dac0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007f6915e1ee38 CR3: 00000001205ee000 CR4: 00000000000006e0
kernel: Call Trace:
kernel:  tpm_transmit+0x15f/0x3d0 [tpm]
kernel:  tpm_transmit_cmd+0x25/0x90 [tpm]
kernel:  tpm2_probe+0xe2/0x140 [tpm]
kernel:  tpm_tis_core_init+0x1d5/0x2b0 [tpm_tis_core]
kernel:  ? tpm_tis_init.part.0+0x130/0x130 [tpm_tis]
kernel:  tpm_tis_pnp_init+0xe1/0x110 [tpm_tis]
kernel:  pnp_device_probe+0xaf/0x140
kernel:  really_probe+0x205/0x460
kernel:  driver_probe_device+0xe1/0x150
kernel:  device_driver_attach+0xa1/0xb0
kernel:  __driver_attach+0x8a/0x150
kernel:  ? device_driver_attach+0xb0/0xb0
kernel:  ? device_driver_attach+0xb0/0xb0
kernel:  bus_for_each_dev+0x78/0xc0
kernel:  bus_add_driver+0x12b/0x1e0
kernel:  driver_register+0x8b/0xe0
kernel:  ? 0xffffffffc1193000
kernel:  init_tis+0xa0/0x1000 [tpm_tis]
kernel:  do_one_initcall+0x44/0x1d0
kernel:  ? do_init_module+0x23/0x250
kernel:  ? kmem_cache_alloc_trace+0xf5/0x200
kernel:  do_init_module+0x5c/0x250
kernel:  __do_sys_finit_module+0xb1/0x110
kernel:  do_syscall_64+0x33/0x80
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: RIP: 0033:0x7f691741f959
kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 55 0c 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007ffe0c62a958 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
kernel: RAX: ffffffffffffffda RBX: 00005630aad85910 RCX: 00007f691741f959
kernel: RDX: 0000000000000000 RSI: 00007f69175aae4d RDI: 0000000000000012
kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 00005630aad71d88
kernel: R10: 0000000000000012 R11: 0000000000000246 R12: 00007f69175aae4d
kernel: R13: 0000000000000000 R14: 00005630aad864f0 R15: 00005630aad85910
kernel: ---[ end trace 3dd14c12be7cbb7c ]---
kernel: tpm_tis 00:06: 1.2 TPM (device-id 0x6871, rev-id 1)

[1/4] tpm_tis: Clean up locality release

Commit Message

Comments

Patch