diff mbox series

[1/3] tpm: protect against locality counter underflow

Message ID 20240131170824.6183-2-dpsmith@apertussolutions.com (mailing list archive)
State New
Headers show
Series [1/3] tpm: protect against locality counter underflow | expand

Commit Message

Daniel P. Smith Jan. 31, 2024, 5:08 p.m. UTC
Commit 933bfc5ad213 introduced the use of a locality counter to control when a
locality request is allowed to be sent to the TPM. In the commit, the counter
is indiscriminately decremented. Thus creating a situation for an integer
underflow of the counter.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
Reported-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
Fixes: 933bfc5ad213 ("tpm, tpm: Implement usage counter for locality")
---
 drivers/char/tpm/tpm_tis_core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jarkko Sakkinen Feb. 1, 2024, 10:21 p.m. UTC | #1
On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> locality request is allowed to be sent to the TPM. In the commit, the counter
> is indiscriminately decremented. Thus creating a situation for an integer
> underflow of the counter.

What is the sequence of events that leads to this triggering the
underflow? This information should be represent in the commit message.

>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> Reported-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
> Fixes: 933bfc5ad213 ("tpm, tpm: Implement usage counter for locality")
> ---
>  drivers/char/tpm/tpm_tis_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 1b350412d8a6..4176d3bd1f04 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -180,7 +180,8 @@ static int tpm_tis_relinquish_locality(struct tpm_chip *chip, int l)
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>  
>  	mutex_lock(&priv->locality_count_mutex);
> -	priv->locality_count--;
> +	if (priv->locality_count > 0)
> +		priv->locality_count--;
>  	if (priv->locality_count == 0)
>  		__tpm_tis_relinquish_locality(priv, l);
>  	mutex_unlock(&priv->locality_count_mutex);

BR, Jarkko
Lino Sanfilippo Feb. 2, 2024, 3:08 a.m. UTC | #2
On 01.02.24 23:21, Jarkko Sakkinen wrote:

> 
> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>> locality request is allowed to be sent to the TPM. In the commit, the counter
>> is indiscriminately decremented. Thus creating a situation for an integer
>> underflow of the counter.
> 
> What is the sequence of events that leads to this triggering the
> underflow? This information should be represent in the commit message.
> 

AFAIU this is:

1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
for the first time, but since a locality is (unexpectedly) already active check_locality() and consequently
__tpm_tis_request_locality() return "true". This prevents the locality_counter from being increased
to 1, see 

	ret = __tpm_tis_request_locality(chip, l);
	if (!ret) /* Counter not increased since ret == 1 */
		priv->locality_count++;

in tpm_tis_request_locality().

If now the locality is released the counter is decreased to below zero (resulting
in an underflow since "locality_counter" is an unsigned int.
Jarkko Sakkinen Feb. 12, 2024, 8:05 p.m. UTC | #3
On Fri Feb 2, 2024 at 5:08 AM EET, Lino Sanfilippo wrote:
>
>
> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>
> > 
> > On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> >> locality request is allowed to be sent to the TPM. In the commit, the counter
> >> is indiscriminately decremented. Thus creating a situation for an integer
> >> underflow of the counter.
> > 
> > What is the sequence of events that leads to this triggering the
> > underflow? This information should be represent in the commit message.
> > 
>
> AFAIU this is:
>
> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> for the first time, but since a locality is (unexpectedly) already active check_locality() and consequently
> __tpm_tis_request_locality() return "true". This prevents the locality_counter from being increased
> to 1, see 
>
> 	ret = __tpm_tis_request_locality(chip, l);
> 	if (!ret) /* Counter not increased since ret == 1 */
> 		priv->locality_count++;
>
> in tpm_tis_request_locality().
>
> If now the locality is released the counter is decreased to below zero (resulting
> in an underflow since "locality_counter" is an unsigned int. 

Thanks, Daniel, can you transcript this to the commit message?

BR, Jarkko
Daniel P. Smith Feb. 19, 2024, 5:54 p.m. UTC | #4
On 2/12/24 15:05, Jarkko Sakkinen wrote:
> On Fri Feb 2, 2024 at 5:08 AM EET, Lino Sanfilippo wrote:
>>
>>
>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>
>>>
>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>> underflow of the counter.
>>>
>>> What is the sequence of events that leads to this triggering the
>>> underflow? This information should be represent in the commit message.
>>>
>>
>> AFAIU this is:
>>
>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>> for the first time, but since a locality is (unexpectedly) already active check_locality() and consequently
>> __tpm_tis_request_locality() return "true". This prevents the locality_counter from being increased
>> to 1, see
>>
>> 	ret = __tpm_tis_request_locality(chip, l);
>> 	if (!ret) /* Counter not increased since ret == 1 */
>> 		priv->locality_count++;
>>
>> in tpm_tis_request_locality().
>>
>> If now the locality is released the counter is decreased to below zero (resulting
>> in an underflow since "locality_counter" is an unsigned int.
> 
> Thanks, Daniel, can you transcript this to the commit message?

ack

v/r,
dps
Alexander Steffen Feb. 20, 2024, 6:42 p.m. UTC | #5
On 02.02.2024 04:08, Lino Sanfilippo wrote:
> On 01.02.24 23:21, Jarkko Sakkinen wrote:
> 
>>
>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>> is indiscriminately decremented. Thus creating a situation for an integer
>>> underflow of the counter.
>>
>> What is the sequence of events that leads to this triggering the
>> underflow? This information should be represent in the commit message.
>>
> 
> AFAIU this is:
> 
> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> for the first time, but since a locality is (unexpectedly) already active
> check_locality() and consequently __tpm_tis_request_locality() return "true".

check_locality() returns true, but __tpm_tis_request_locality() returns 
the requested locality. Currently, this is always 0, so the check for 
!ret will always correctly indicate success and increment the 
locality_count.

But since theoretically a locality != 0 could be requested, the correct 
fix would be to check for something like ret >= 0 or ret == l instead of 
!ret. Then the counter will also be incremented correctly for localities 
!= 0, and no underflow will happen later on. Therefore, explicitly 
checking for an underflow is unnecessary and hides the real problem.

> This prevents the locality_counter from being increased to 1, see
> 
>          ret = __tpm_tis_request_locality(chip, l);
>          if (!ret) /* Counter not increased since ret == 1 */
>                  priv->locality_count++;
> 
> in tpm_tis_request_locality().
> 
> If now the locality is released the counter is decreased to below zero (resulting
> in an underflow since "locality_counter" is an unsigned int.
Jarkko Sakkinen Feb. 20, 2024, 7:04 p.m. UTC | #6
On Tue Feb 20, 2024 at 8:42 PM EET, Alexander Steffen wrote:
> On 02.02.2024 04:08, Lino Sanfilippo wrote:
> > On 01.02.24 23:21, Jarkko Sakkinen wrote:
> > 
> >>
> >> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> >>> locality request is allowed to be sent to the TPM. In the commit, the counter
> >>> is indiscriminately decremented. Thus creating a situation for an integer
> >>> underflow of the counter.
> >>
> >> What is the sequence of events that leads to this triggering the
> >> underflow? This information should be represent in the commit message.
> >>
> > 
> > AFAIU this is:
> > 
> > 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> > for the first time, but since a locality is (unexpectedly) already active
> > check_locality() and consequently __tpm_tis_request_locality() return "true".
>
> check_locality() returns true, but __tpm_tis_request_locality() returns 
> the requested locality. Currently, this is always 0, so the check for 
> !ret will always correctly indicate success and increment the 
> locality_count.
>
> But since theoretically a locality != 0 could be requested, the correct 
> fix would be to check for something like ret >= 0 or ret == l instead of 
> !ret. Then the counter will also be incremented correctly for localities 
> != 0, and no underflow will happen later on. Therefore, explicitly 
> checking for an underflow is unnecessary and hides the real problem.

Good point.

I think that the check should contain info-level klog message of the
event together with the check against the underflow. I think this is
very useful info for live systems.

BR, Jarkko
Lino Sanfilippo Feb. 20, 2024, 8:54 p.m. UTC | #7
Hi,

On 20.02.24 19:42, Alexander Steffen wrote:
> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> 
> 
> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>
>>>
>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>> underflow of the counter.
>>>
>>> What is the sequence of events that leads to this triggering the
>>> underflow? This information should be represent in the commit message.
>>>
>>
>> AFAIU this is:
>>
>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>> for the first time, but since a locality is (unexpectedly) already active
>> check_locality() and consequently __tpm_tis_request_locality() return "true".
> 
> check_locality() returns true, but __tpm_tis_request_locality() returns
> the requested locality. Currently, this is always 0, so the check for
> !ret will always correctly indicate success and increment the
> locality_count.
> 

Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
passed from one function to another.
But this is rather code optimization and not really required to fix the reported bug.

As I already wrote in a former comment, the actual bug fix would IMHO simply be to
make sure that no localities are held at the beginning of tpm_tis_core_init():

for (i = 0; i <= MAX_LOCALITY; i++)
	__tpm_tis_relinquish_locality(priv, i);

before wait_startup() should be sufficient.

Regards,
Lino
Jarkko Sakkinen Feb. 20, 2024, 10:23 p.m. UTC | #8
On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> Hi,
>
> On 20.02.24 19:42, Alexander Steffen wrote:
> > ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> > 
> > 
> > On 02.02.2024 04:08, Lino Sanfilippo wrote:
> >> On 01.02.24 23:21, Jarkko Sakkinen wrote:
> >>
> >>>
> >>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> >>>> locality request is allowed to be sent to the TPM. In the commit, the counter
> >>>> is indiscriminately decremented. Thus creating a situation for an integer
> >>>> underflow of the counter.
> >>>
> >>> What is the sequence of events that leads to this triggering the
> >>> underflow? This information should be represent in the commit message.
> >>>
> >>
> >> AFAIU this is:
> >>
> >> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> >> for the first time, but since a locality is (unexpectedly) already active
> >> check_locality() and consequently __tpm_tis_request_locality() return "true".
> > 
> > check_locality() returns true, but __tpm_tis_request_locality() returns
> > the requested locality. Currently, this is always 0, so the check for
> > !ret will always correctly indicate success and increment the
> > locality_count.
> > 
>
> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
> passed from one function to another.

Usually, or at least use cases I'm aware of, localities are per
component. E.g. Intel TXT has one and Linux has another.

There's been some proposals in the past here for hypervisor specific
locality here at LKML they didn't lead to anything.

If you are suggesting of removing "int l" parameter altogether, I
do support that idea.

> But this is rather code optimization and not really required to fix
> the reported bug.

Just adding here that I wish we also had a log transcript of bug, which
is right now missing. The explanation believable enough to move forward
but I still wish to see a log transcript.

A/B testing of the bug and fix is something I'm lacking here. If anyone
has ideas how to use QEMU to simulate what Intel TXT does with
localities that would help.

Most of us do not carry Intel TXT setup anywhere (home or office).

Also even tho 0/3 has an explanation bug 1/3 does not have anything at
all to make it to be counted as a bug fix. Pretty difficult to compare
any possible proposals for a fix on this playground tbh.

BR, Jarkko
Jarkko Sakkinen Feb. 20, 2024, 10:26 p.m. UTC | #9
On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> for (i = 0; i <= MAX_LOCALITY; i++)
> 	__tpm_tis_relinquish_locality(priv, i);

I'm pretty unfamiliar with Intel TXT so asking a dummy question:
if Intel TXT uses locality 2 I suppose we should not try to
relinquish it, or?

AFAIK, we don't have a symbol called MAX_LOCALITY.

BR, Jarkko
Jarkko Sakkinen Feb. 20, 2024, 10:31 p.m. UTC | #10
On Tue Feb 20, 2024 at 10:26 PM UTC, Jarkko Sakkinen wrote:
> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> > for (i = 0; i <= MAX_LOCALITY; i++)
> > 	__tpm_tis_relinquish_locality(priv, i);
>
> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> if Intel TXT uses locality 2 I suppose we should not try to
> relinquish it, or?
>
> AFAIK, we don't have a symbol called MAX_LOCALITY.

OK it was called TPM_MAX_LOCALITY :-) I had the patch set applied
in one branch but looked up with wrong symbol name.

So I reformalize my question to two parts:

1. Why does TXT leave locality 2 open in the first place? I did
   not see explanation. Isn't this a bug in TXT?
2. Because localities are not too useful these days given TPM2's
   policy mechanism I cannot recall out of top of my head can
   you have two localities open at same time. So what kind of
   conflict happens when you try to open locality 0 and have
   locality 2 open?

BR, Jarkko
Ross Philipson Feb. 20, 2024, 10:57 p.m. UTC | #11
On 2/20/24 2:26 PM, Jarkko Sakkinen wrote:
> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
>> for (i = 0; i <= MAX_LOCALITY; i++)
>> 	__tpm_tis_relinquish_locality(priv, i);
> 
> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> if Intel TXT uses locality 2 I suppose we should not try to
> relinquish it, or?

The TPM has five localities (0 - 4). Localities 1 - 4 are for DRTM 
support. For TXT, locality 4 is hard wired to the CPU - nothing else can 
touch it. Locality 3 is only ever accessible when the CPU is executing 
an AC (Authenticated Code) module. That leaves 1 and 2 for the DRTM 
software environment to use. If the DRTM software opens 1 or 2, it 
should close them before exiting the DRTM.

> 
> AFAIK, we don't have a symbol called MAX_LOCALITY.

Daniel added it in the patch set.

Thanks
Ross

> 
> BR, Jarkko
Jarkko Sakkinen Feb. 20, 2024, 11:10 p.m. UTC | #12
On Tue Feb 20, 2024 at 10:57 PM UTC,  wrote:
> On 2/20/24 2:26 PM, Jarkko Sakkinen wrote:
> > On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> >> for (i = 0; i <= MAX_LOCALITY; i++)
> >> 	__tpm_tis_relinquish_locality(priv, i);
> > 
> > I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> > if Intel TXT uses locality 2 I suppose we should not try to
> > relinquish it, or?
>
> The TPM has five localities (0 - 4). Localities 1 - 4 are for DRTM 
> support. For TXT, locality 4 is hard wired to the CPU - nothing else can 

Locality 4 is familiar because it comes across from time to time.

If I recall correctly DRTM should use only localities 3-4 and 
localities 0-2 should be reserved for the OS use.

So this does not match what I recall unfortunately but I'm not
really expert with this stuff.

The patches has zero explanations SINIT ACM's behaviour on
locality use and without that this cannot move forward because
there's neither way to reproduce any of this.

Actually there's zero effort on anything related to SINIT.

> an AC (Authenticated Code) module. That leaves 1 and 2 for the DRTM 
> software environment to use. If the DRTM software opens 1 or 2, it 
> should close them before exiting the DRTM.
>
> > 
> > AFAIK, we don't have a symbol called MAX_LOCALITY.
>
> Daniel added it in the patch set.

Got this, my symbol lookup just failed in my Git tree but looking at
the patch set there was a symbol called *TPM_*MAX_LOCALITY :-)

BR, Jarkko
Jarkko Sakkinen Feb. 20, 2024, 11:13 p.m. UTC | #13
On Tue Feb 20, 2024 at 11:10 PM UTC, Jarkko Sakkinen wrote
> On Tue Feb 20, 2024 at 10:57 PM UTC,  wrote:
> > On 2/20/24 2:26 PM, Jarkko Sakkinen wrote:
> > > On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> > >> for (i = 0; i <= MAX_LOCALITY; i++)
> > >> 	__tpm_tis_relinquish_locality(priv, i);
> > > 
> > > I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> > > if Intel TXT uses locality 2 I suppose we should not try to
> > > relinquish it, or?
> >
> > The TPM has five localities (0 - 4). Localities 1 - 4 are for DRTM 
> > support. For TXT, locality 4 is hard wired to the CPU - nothing else can 
>
> Locality 4 is familiar because it comes across from time to time.
>
> If I recall correctly DRTM should use only localities 3-4 and 
> localities 0-2 should be reserved for the OS use.
>
> So this does not match what I recall unfortunately but I'm not
> really expert with this stuff.
>
> The patches has zero explanations SINIT ACM's behaviour on
> locality use and without that this cannot move forward because
> there's neither way to reproduce any of this.
>
> Actually there's zero effort on anything related to SINIT.

To put short we need a clearer sequence what Intel TXT does
causing leaving localities does. If that is nailed to less
abstract description then we can review this.

If we know this blackbox model then there's a chance to make
simulation of it e.g. with QEMU.

Alternatively we need a more trivial scenario to reproduce
a bug in locality handling than Intel TXT. There's no enough
beef ATM to really make good decisions what sort of code change
would be best for Linux.

BR, Jarkko
Lino Sanfilippo Feb. 20, 2024, 11:19 p.m. UTC | #14
On 20.02.24 23:23, Jarkko Sakkinen wrote:
> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
>> Hi,
>>
>> On 20.02.24 19:42, Alexander Steffen wrote:
>>> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
>>>
>>>
>>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>>
>>>>>
>>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>>>> underflow of the counter.
>>>>>
>>>>> What is the sequence of events that leads to this triggering the
>>>>> underflow? This information should be represent in the commit message.
>>>>>
>>>>
>>>> AFAIU this is:
>>>>
>>>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>>>> for the first time, but since a locality is (unexpectedly) already active
>>>> check_locality() and consequently __tpm_tis_request_locality() return "true".
>>>
>>> check_locality() returns true, but __tpm_tis_request_locality() returns
>>> the requested locality. Currently, this is always 0, so the check for
>>> !ret will always correctly indicate success and increment the
>>> locality_count.
>>>
>>
>> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
>> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
>> passed from one function to another.
>
> Usually, or at least use cases I'm aware of, localities are per
> component. E.g. Intel TXT has one and Linux has another.
>
> There's been some proposals in the past here for hypervisor specific
> locality here at LKML they didn't lead to anything.
>
> If you are suggesting of removing "int l" parameter altogether, I
> do support that idea.
>

Yes, removing the "l" parameter is what I meant. I can prepare a patch for
the removal.

Regards,
Lino

>> But this is rather code optimization and not really required to fix
>> the reported bug.
>
> Just adding here that I wish we also had a log transcript of bug, which
> is right now missing. The explanation believable enough to move forward
> but I still wish to see a log transcript.
>
> A/B testing of the bug and fix is something I'm lacking here. If anyone
> has ideas how to use QEMU to simulate what Intel TXT does with
> localities that would help.
>
> Most of us do not carry Intel TXT setup anywhere (home or office).
>
> Also even tho 0/3 has an explanation bug 1/3 does not have anything at
> all to make it to be counted as a bug fix. Pretty difficult to compare
> any possible proposals for a fix on this playground tbh.
>
> BR, Jarkko
>
Lino Sanfilippo Feb. 20, 2024, 11:26 p.m. UTC | #15
On 20.02.24 23:31, Jarkko Sakkinen wrote:
> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> 
> 
> On Tue Feb 20, 2024 at 10:26 PM UTC, Jarkko Sakkinen wrote:
>> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
>>> for (i = 0; i <= MAX_LOCALITY; i++)
>>>     __tpm_tis_relinquish_locality(priv, i);
>>
>> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
>> if Intel TXT uses locality 2 I suppose we should not try to
>> relinquish it, or?
>>
>> AFAIK, we don't have a symbol called MAX_LOCALITY.
> 
> OK it was called TPM_MAX_LOCALITY :-) I had the patch set applied
> in one branch but looked up with wrong symbol name.
> 

Sorry for the confusion. The code was just meant to sketch a solution, it was 
written out of my head and I just remembered that Daniels patch set introduced
some define for the max number of the localities. I did not look up the correct
name though.
Jarkko Sakkinen Feb. 21, 2024, 12:40 a.m. UTC | #16
On Tue Feb 20, 2024 at 11:19 PM UTC, Lino Sanfilippo wrote:
>
>
> On 20.02.24 23:23, Jarkko Sakkinen wrote:
> > On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> >> Hi,
> >>
> >> On 20.02.24 19:42, Alexander Steffen wrote:
> >>> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> >>>
> >>>
> >>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
> >>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
> >>>>
> >>>>>
> >>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> >>>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
> >>>>>> is indiscriminately decremented. Thus creating a situation for an integer
> >>>>>> underflow of the counter.
> >>>>>
> >>>>> What is the sequence of events that leads to this triggering the
> >>>>> underflow? This information should be represent in the commit message.
> >>>>>
> >>>>
> >>>> AFAIU this is:
> >>>>
> >>>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> >>>> for the first time, but since a locality is (unexpectedly) already active
> >>>> check_locality() and consequently __tpm_tis_request_locality() return "true".
> >>>
> >>> check_locality() returns true, but __tpm_tis_request_locality() returns
> >>> the requested locality. Currently, this is always 0, so the check for
> >>> !ret will always correctly indicate success and increment the
> >>> locality_count.
> >>>
> >>
> >> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
> >> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
> >> passed from one function to another.
> >
> > Usually, or at least use cases I'm aware of, localities are per
> > component. E.g. Intel TXT has one and Linux has another.
> >
> > There's been some proposals in the past here for hypervisor specific
> > locality here at LKML they didn't lead to anything.
> >
> > If you are suggesting of removing "int l" parameter altogether, I
> > do support that idea.
> >
>
> Yes, removing the "l" parameter is what I meant. I can prepare a patch for
> the removal.

This change BTW does not need to be supported by any bug per se as
removing useless code is always welcome.

If we wanted ever use let's say separate locality for hypervisor,
we would want to design such feature from ground up. I don't think
this will happen tho since localities are sort of trend that died
with TPM 1.2... It had only authorization value and locality brought
some additional granularity to it.

As for this patch set I also don't think kernel should care about
localities beyond 2 or at least not ever try relinquish them.

I.e. it should at most relinquish localities 0-2. The only action
taken for 3-4 should really be perhaps rollbacking the driver init
and report to klog that these localities have been left open by
the platform.

BR, Jarkko
Jarkko Sakkinen Feb. 21, 2024, 12:42 a.m. UTC | #17
On Tue Feb 20, 2024 at 11:26 PM UTC, Lino Sanfilippo wrote:
>
>
> On 20.02.24 23:31, Jarkko Sakkinen wrote:
> > ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> > 
> > 
> > On Tue Feb 20, 2024 at 10:26 PM UTC, Jarkko Sakkinen wrote:
> >> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> >>> for (i = 0; i <= MAX_LOCALITY; i++)
> >>>     __tpm_tis_relinquish_locality(priv, i);
> >>
> >> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> >> if Intel TXT uses locality 2 I suppose we should not try to
> >> relinquish it, or?
> >>
> >> AFAIK, we don't have a symbol called MAX_LOCALITY.
> > 
> > OK it was called TPM_MAX_LOCALITY :-) I had the patch set applied
> > in one branch but looked up with wrong symbol name.
> > 
>
> Sorry for the confusion. The code was just meant to sketch a solution, it was 
> written out of my head and I just remembered that Daniels patch set introduced
> some define for the max number of the localities. I did not look up the correct
> name though.

NP

BR, Jarkko
James Bottomley Feb. 21, 2024, 12:37 p.m. UTC | #18
On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> 
> 2. Because localities are not too useful these days given TPM2's
>    policy mechanism

Localitites are useful to the TPM2 policy mechanism.  When we get key
policy in the kernel it will give us a way to create TPM wrapped keys
that can only be unwrapped in the kernel if we run the kernel in a
different locality from userspace (I already have demo patches doing
this).

>  I cannot recall out of top of my head can
>    you have two localities open at same time.

I think there's a misunderstanding about what localities are: they're
effectively an additional platform supplied tag to a command.  Each
command can therefore have one and only one locality.  The TPM doesn't
have a concept of which locality is open; if you look at the reference
implementation, the simulator has a __plat__LocalitySet() function
which places all commands in the just set locality until you change to
a different one.

However, since the way localities are implemented (i.e. what triggers
_plat__LocalitySet()) is implementation defined, each physical TPM
device has a different way of doing the set (for instance, for TIS
TPM's locality is a function of the port set used to address the TPM;
for CRB TPMs it can be an additional tag on the buffer for command
submission).   I think the locality request/relinquish was modelled
after some other HW, but I don't know what.

James
Jarkko Sakkinen Feb. 21, 2024, 7:43 p.m. UTC | #19
On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> > 
> > 2. Because localities are not too useful these days given TPM2's
> >    policy mechanism
>
> Localitites are useful to the TPM2 policy mechanism.  When we get key
> policy in the kernel it will give us a way to create TPM wrapped keys
> that can only be unwrapped in the kernel if we run the kernel in a
> different locality from userspace (I already have demo patches doing
> this).

Let's keep this discussion in scope, please.

Removing useless code using registers that you might have some actually
useful use is not wrong thing to do. It is better to look at things from
clean slate when the time comes.

> >  I cannot recall out of top of my head can
> >    you have two localities open at same time.
>
> I think there's a misunderstanding about what localities are: they're
> effectively an additional platform supplied tag to a command.  Each
> command can therefore have one and only one locality.  The TPM doesn't

Actually this was not unclear at all. I even read the chapters from
Ariel Segall's yesterday as a refresher.

I was merely asking that if TPM_ACCESS_X is not properly cleared and you
se TPM_ACCESS_Y where Y < X how does the hardware react as the bug
report is pretty open ended and not very clear of the steps leading to
unwanted results.

With a quick check from [1] could not spot the conflict reaction but
it is probably there.

> submission).   I think the locality request/relinquish was modelled
> after some other HW, but I don't know what.

My wild guess: first implementation was made when TPM's became available
and there was no analytical thinking other than getting something that
runs :-)

> James

[1] https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf

BR, Jarkko
Jarkko Sakkinen Feb. 21, 2024, 7:45 p.m. UTC | #20
On Wed Feb 21, 2024 at 7:43 PM UTC, Jarkko Sakkinen wrote:
> On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> > On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> > > 
> > > 2. Because localities are not too useful these days given TPM2's
> > >    policy mechanism
> >
> > Localitites are useful to the TPM2 policy mechanism.  When we get key
> > policy in the kernel it will give us a way to create TPM wrapped keys
> > that can only be unwrapped in the kernel if we run the kernel in a
> > different locality from userspace (I already have demo patches doing
> > this).
>
> Let's keep this discussion in scope, please.
>
> Removing useless code using registers that you might have some actually
> useful use is not wrong thing to do. It is better to look at things from
> clean slate when the time comes.
>
> > >  I cannot recall out of top of my head can
> > >    you have two localities open at same time.
> >
> > I think there's a misunderstanding about what localities are: they're
> > effectively an additional platform supplied tag to a command.  Each
> > command can therefore have one and only one locality.  The TPM doesn't
>
> Actually this was not unclear at all. I even read the chapters from
> Ariel Segall's yesterday as a refresher.

Refering to https://www.amazon.com/Trusted-Platform-Modules-Computing-Networks/dp/1849198934

SBR, Jarkko
James Bottomley Feb. 22, 2024, 9:06 a.m. UTC | #21
On Wed, 2024-02-21 at 19:43 +0000, Jarkko Sakkinen wrote:
> On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> > On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
[...]
> > >  I cannot recall out of top of my head can
> > >    you have two localities open at same time.
> > 
> > I think there's a misunderstanding about what localities are:
> > they're effectively an additional platform supplied tag to a
> > command.  Each command can therefore have one and only one
> > locality.  The TPM doesn't
> 
> Actually this was not unclear at all. I even read the chapters from
> Ariel Segall's yesterday as a refresher.
> 
> I was merely asking that if TPM_ACCESS_X is not properly cleared and
> you se TPM_ACCESS_Y where Y < X how does the hardware react as the
> bug report is pretty open ended and not very clear of the steps
> leading to unwanted results.

So TPM_ACCESS_X is *not* a generic TPM thing, it's a TIS interface
specific thing.  Now the TIS interface seems to be dominating, so
perhaps it is the correct programming model for us to follow, but not
all current TPMs adhere to it.

> With a quick check from [1] could not spot the conflict reaction but
> it is probably there.

The way platforms should handle localities is now detailed in the TCG
library code snippets (part 4 Supporting Routines - Code):

https://trustedcomputinggroup.org/resource/tpm-library-specification/

It's the _plat__LocalityGet/Set in Appendix C

The implementation documented there is what the TPM reference
implementation follows.

James
Jarkko Sakkinen Feb. 22, 2024, 11:49 p.m. UTC | #22
On Thu Feb 22, 2024 at 11:06 AM EET, James Bottomley wrote:
> On Wed, 2024-02-21 at 19:43 +0000, Jarkko Sakkinen wrote:
> > On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> > > On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> [...]
> > > >  I cannot recall out of top of my head can
> > > >    you have two localities open at same time.
> > > 
> > > I think there's a misunderstanding about what localities are:
> > > they're effectively an additional platform supplied tag to a
> > > command.  Each command can therefore have one and only one
> > > locality.  The TPM doesn't
> > 
> > Actually this was not unclear at all. I even read the chapters from
> > Ariel Segall's yesterday as a refresher.
> > 
> > I was merely asking that if TPM_ACCESS_X is not properly cleared and
> > you se TPM_ACCESS_Y where Y < X how does the hardware react as the
> > bug report is pretty open ended and not very clear of the steps
> > leading to unwanted results.
>
> So TPM_ACCESS_X is *not* a generic TPM thing, it's a TIS interface
> specific thing.  Now the TIS interface seems to be dominating, so
> perhaps it is the correct programming model for us to follow, but not
> all current TPMs adhere to it.

I know, I only have CRB based TPMs in my host machines but here the
context is TIS interface so in this scope it's what we care about.

We're trying to fix a bug here, not speculate what additional
features could be done with localities.

BR, Jarkko
Jarkko Sakkinen Feb. 23, 2024, 12:01 a.m. UTC | #23
On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> locality request is allowed to be sent to the TPM. In the commit, the counter
> is indiscriminately decremented. Thus creating a situation for an integer
> underflow of the counter.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
> Reported-by: Kanth Ghatraju <kanth.ghatraju@oracle.com>
> Fixes: 933bfc5ad213 ("tpm, tpm: Implement usage counter for locality")
> ---
>  drivers/char/tpm/tpm_tis_core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index 1b350412d8a6..4176d3bd1f04 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -180,7 +180,8 @@ static int tpm_tis_relinquish_locality(struct tpm_chip *chip, int l)
>  	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>  
>  	mutex_lock(&priv->locality_count_mutex);
> -	priv->locality_count--;
> +	if (priv->locality_count > 0)
> +		priv->locality_count--;
>  	if (priv->locality_count == 0)
>  		__tpm_tis_relinquish_locality(priv, l);
>  	mutex_unlock(&priv->locality_count_mutex);

To make this patch set better the whole story should be scenario based.

Starting from cover letter the explanation is way too rounded to guide
to the conclusion that these are actually best possible code changes to
fix the issue. 

I agree fully on that the problem should be fixed but given that the
scenarios are fuzzy deciding whether this the right things done right
is the open question.

I.e. we need the steps for destruction and how these patches change
those steps to make this right. Since the whole topic is complicated
I'd use PC Client spec as reference.

BR, Jarkko
Daniel P. Smith Feb. 23, 2024, 1:55 a.m. UTC | #24
On 2/20/24 13:42, Alexander Steffen wrote:
> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>
>>>
>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>> Commit 933bfc5ad213 introduced the use of a locality counter to 
>>>> control when a
>>>> locality request is allowed to be sent to the TPM. In the commit, 
>>>> the counter
>>>> is indiscriminately decremented. Thus creating a situation for an 
>>>> integer
>>>> underflow of the counter.
>>>
>>> What is the sequence of events that leads to this triggering the
>>> underflow? This information should be represent in the commit message.
>>>
>>
>> AFAIU this is:
>>
>> 1. We start with a locality_counter of 0 and then we call 
>> tpm_tis_request_locality()
>> for the first time, but since a locality is (unexpectedly) already active
>> check_locality() and consequently __tpm_tis_request_locality() return 
>> "true".
> 
> check_locality() returns true, but __tpm_tis_request_locality() returns 
> the requested locality. Currently, this is always 0, so the check for 
> !ret will always correctly indicate success and increment the 
> locality_count.
> 
> But since theoretically a locality != 0 could be requested, the correct 
> fix would be to check for something like ret >= 0 or ret == l instead of 
> !ret. Then the counter will also be incremented correctly for localities 
> != 0, and no underflow will happen later on. Therefore, explicitly 
> checking for an underflow is unnecessary and hides the real problem.
> 

My apologies, but I will have to humbly disagree from a fundamental 
level here. If a state variable has bounds, then those bounds should be 
enforced when the variable is being manipulated. Assuming that every 
path leading to the variable manipulation code has ensured proper 
manipulation is just that, an assumption. When assumptions fail is how 
bugs and vulnerabilities occur.

To your point, does this full address the situation experienced, I would 
say it does not. IMHO, the situation is really a combination of both 
patch 1 and patch 2, but the request was to split the changes for 
individual discussion. We selected this one as being the fixes for two 
reasons. First, it blocks the underflow such that when the Secure Launch 
series opens Locality 2, it will get incremented at that time and the 
internal locality tracking state variables will end up with the correct 
values. Thus leading to the relinquish succeeding at kernel shutdown. 
Second, it provides a stronger defensive coding practice.

Another reason that this works as a fix is that the TPM specification 
requires the registers to be mirrored across all localities, regardless 
of the active locality. While all the request/relinquishes for Locality 
0 sent by the early code do not succeed, obtaining the values via the 
Locality 0 registers are still guaranteed to be correct.

v/r,
dps
Daniel P. Smith Feb. 23, 2024, 1:56 a.m. UTC | #25
On 2/20/24 15:54, Lino Sanfilippo wrote:
> Hi,
> 
> On 20.02.24 19:42, Alexander Steffen wrote:
>> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
>>
>>
>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>
>>>>
>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>>> underflow of the counter.
>>>>
>>>> What is the sequence of events that leads to this triggering the
>>>> underflow? This information should be represent in the commit message.
>>>>
>>>
>>> AFAIU this is:
>>>
>>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>>> for the first time, but since a locality is (unexpectedly) already active
>>> check_locality() and consequently __tpm_tis_request_locality() return "true".
>>
>> check_locality() returns true, but __tpm_tis_request_locality() returns
>> the requested locality. Currently, this is always 0, so the check for
>> !ret will always correctly indicate success and increment the
>> locality_count.
>>
> 
> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
> passed from one function to another.
> But this is rather code optimization and not really required to fix the reported bug.

Actually, doing so will break the TPM API. The function 
tpm_tis_request_locality() is registered as the locality handler, 
  int (*request_locality)(struct tpm_chip *chip, int loc), in the tis 
instance of struct tpm_class_ops{}. This is the API used by the Secure 
Launch series to open Locality2 for the measurements it must record.

v/r,
dps
Daniel P. Smith Feb. 23, 2024, 1:57 a.m. UTC | #26
On 2/20/24 17:31, Jarkko Sakkinen wrote:
> On Tue Feb 20, 2024 at 10:26 PM UTC, Jarkko Sakkinen wrote:
>> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
>>> for (i = 0; i <= MAX_LOCALITY; i++)
>>> 	__tpm_tis_relinquish_locality(priv, i);
>>
>> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
>> if Intel TXT uses locality 2 I suppose we should not try to
>> relinquish it, or?
>>
>> AFAIK, we don't have a symbol called MAX_LOCALITY.
> 
> OK it was called TPM_MAX_LOCALITY :-) I had the patch set applied
> in one branch but looked up with wrong symbol name.
> 
> So I reformalize my question to two parts:
> 
> 1. Why does TXT leave locality 2 open in the first place? I did
>     not see explanation. Isn't this a bug in TXT?

It does so because that is what the TCG D-RTM specification requires. 
See Section 5.3.4.10 of the TCG D-RTM specification[1], the first 
requirement is, "The DLME SHALL receive control with access to Locality 2."

> 2. Because localities are not too useful these days given TPM2's
>     policy mechanism I cannot recall out of top of my head can
>     you have two localities open at same time. So what kind of
>     conflict happens when you try to open locality 0 and have
>     locality 2 open?

I would disagree and would call your attention to the TCG's 
definition/motivation for localities, Section 3.2 of Client PTP 
specification[2].

"“Locality” is an assertion to the TPM that a command’s source is 
associated with a particular component. Locality can be thought of as a 
hardware-based authorization. The TPM is not actually aware of the 
nature of the relationship between the locality and the component. The 
ability to reset and extend notwithstanding, it is important to note 
that, from a PCR “usage” perspective, there is no hierarchical 
relationship between different localities. The TPM simply enforces 
locality restrictions on TPM assets (such as PCR or SEALed blobs)."

As stated, from the TPM specification perspective, it is not aware of 
this mapping to components and leaves it to the platform to enforce.

"The protection and separation of the localities (and therefore the 
association with the associated components) is entirely the 
responsibility of the platform components. Platform components, 
including the OS, may provide the separation of localities using 
protection mechanisms such as virtual memory or paging."

The x86 manufactures opted to adopt the D-RTM specification which 
defines the components as follows:

Locality 4: Usually associated with the CPU executing microcode. This is 
used to establish the Dynamic RTM.
Locality 3: Auxiliary components. Use of this is optional and, if used, 
it is implementation dependent.
Locality 2: Dynamically Launched OS (Dynamic OS) “runtime” environment.
Locality 1: An environment for use by the Dynamic OS.
Locality 0: The Static RTM, its chain of trust and its environment.

And the means to protect and separate those localities are encoded in 
the x86 chipset, i.e A D-RTM Event must be used to access any of the 
D-RTM Localities (Locality1 - Locality4).

For Intel, Locality 4 can only be accessed when a dedicated signal 
between the CPU and the chipset is raised, thus only allowing the CPU to 
utilize Locality 4. The CPU will then close Locality 4, authenticate and 
give control to the ACM with access to Locality 3. When the ACM is 
complete, it will instruct the chipset to lock Locality 3 and give 
control to the DLME (MLE in Intel parlance) with Locality 2 open. It is 
up to the DLME, the Linux kernel in this case, to decide how to assign 
components to Locality 1 and 2.

As to proposals to utilize localities by the Linux kernel, the only one 
I was aware of was dropped because they couldn't open the higher localities.

I would also highlight that the D-RTM implementation guide for Arm 
allows for a hardware D-RTM event, which the vendor may choose to 
implement a hardware/CPU enforced access to TPM localities. Thus, the 
ability to support localities will also become a requirement for certain 
Arm CPUs.

[1] 
https://trustedcomputinggroup.org/wp-content/uploads/TCG_D-RTM_Architecture_v1-0_Published_06172013.pdf
[2] 
https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf
Daniel P. Smith Feb. 23, 2024, 1:57 a.m. UTC | #27
On 2/21/24 14:43, Jarkko Sakkinen wrote:
> On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
>> On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
>>>
>>> 2. Because localities are not too useful these days given TPM2's
>>>     policy mechanism
>>
>> Localitites are useful to the TPM2 policy mechanism.  When we get key
>> policy in the kernel it will give us a way to create TPM wrapped keys
>> that can only be unwrapped in the kernel if we run the kernel in a
>> different locality from userspace (I already have demo patches doing
>> this).
> 
> Let's keep this discussion in scope, please.
> 
> Removing useless code using registers that you might have some actually
> useful use is not wrong thing to do. It is better to look at things from
> clean slate when the time comes.
> 
>>>   I cannot recall out of top of my head can
>>>     you have two localities open at same time.
>>
>> I think there's a misunderstanding about what localities are: they're
>> effectively an additional platform supplied tag to a command.  Each
>> command can therefore have one and only one locality.  The TPM doesn't
> 
> Actually this was not unclear at all. I even read the chapters from
> Ariel Segall's yesterday as a refresher.
> 
> I was merely asking that if TPM_ACCESS_X is not properly cleared and you
> se TPM_ACCESS_Y where Y < X how does the hardware react as the bug
> report is pretty open ended and not very clear of the steps leading to
> unwanted results.
> 
> With a quick check from [1] could not spot the conflict reaction but
> it is probably there.

The expected behavior is explained in the Informative Comment of section 
6.5.2.4 of the Client PTP spec[1]:

"The purpose of this register is to allow the processes operating at the 
various localities to share the TPM. The basic notion is that any 
locality can request access to the TPM by setting the 
TPM_ACCESS_x.requestUse field using its assigned TPM_ACCESS_x register 
address. If there is no currently set locality, the TPM sets current 
locality to the requesting one and allows operations only from that 
locality. If the TPM is currently at another locality, the TPM keeps the 
request pending until the currently executing locality frees the TPM. 
Software relinquishes the TPM’s locality by writing a 1 to the 
TPM_ACCESS_x.activeLocality field. Upon release, the TPM honors the 
highest locality request pending. If there is no pending request, the 
TPM enters the “free” state."

>> submission).   I think the locality request/relinquish was modelled
>> after some other HW, but I don't know what.
> 
> My wild guess: first implementation was made when TPM's became available
> and there was no analytical thinking other than getting something that
> runs :-)

Actually, no that is not how it was done. IIRC, localities were designed 
in conjunction with D-RTM when Intel and MS started the LeGrande effort 
back in 2000. It was then generalized for the TPM 1.1b specification. My 
first introduction to LeGrande/TXT wasn't until 2005 as part of an early 
access program. So most of my historical understanding is from 
discussions I luckily got to have with one of the architects and a few 
of the original TCG committee members.

[1] 
https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf

v/r,
dps
Daniel P. Smith Feb. 23, 2024, 1:58 a.m. UTC | #28
On 2/20/24 17:23, Jarkko Sakkinen wrote:
> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
>> Hi,
>>
>> On 20.02.24 19:42, Alexander Steffen wrote:
>>> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
>>>
>>>
>>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>>
>>>>>
>>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>>>> underflow of the counter.
>>>>>
>>>>> What is the sequence of events that leads to this triggering the
>>>>> underflow? This information should be represent in the commit message.
>>>>>
>>>>
>>>> AFAIU this is:
>>>>
>>>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>>>> for the first time, but since a locality is (unexpectedly) already active
>>>> check_locality() and consequently __tpm_tis_request_locality() return "true".
>>>
>>> check_locality() returns true, but __tpm_tis_request_locality() returns
>>> the requested locality. Currently, this is always 0, so the check for
>>> !ret will always correctly indicate success and increment the
>>> locality_count.
>>>
>>
>> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
>> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
>> passed from one function to another.
> 
> Usually, or at least use cases I'm aware of, localities are per
> component. E.g. Intel TXT has one and Linux has another.
> 
> There's been some proposals in the past here for hypervisor specific
> locality here at LKML they didn't lead to anything.
> 
> If you are suggesting of removing "int l" parameter altogether, I
> do support that idea.
> 
>> But this is rather code optimization and not really required to fix
>> the reported bug.
> 
> Just adding here that I wish we also had a log transcript of bug, which
> is right now missing. The explanation believable enough to move forward
> but I still wish to see a log transcript.

That will be forth coming.

v/r,
dps
Jarkko Sakkinen Feb. 23, 2024, 12:58 p.m. UTC | #29
On Fri Feb 23, 2024 at 3:58 AM EET, Daniel P. Smith wrote:
> > Just adding here that I wish we also had a log transcript of bug, which
> > is right now missing. The explanation believable enough to move forward
> > but I still wish to see a log transcript.
>
> That will be forth coming.

I did not respond yet to other responses that you've given in the past 
12'ish hours or so (just woke up) but I started to think how all this
great and useful information would be best kept in memory. Some of it
has been discussed in the past but there is lot of small details that
are too easily forgotten.

I'd think the best "documentation" approach here would be inject the
spec references to the sites where locality behaviour is changed so
that it is easy in future cross-reference them, and least of risk
of having code changes that would break anything. I think this way
all the information that you provided is best preserved for the
future.

Thanks a lot for great and informative responses!

> v/r,
> dps

BR, Jarkko
Jarkko Sakkinen Feb. 23, 2024, 8:40 p.m. UTC | #30
On Fri Feb 23, 2024 at 3:57 AM EET, Daniel P. Smith wrote:
> On 2/21/24 14:43, Jarkko Sakkinen wrote:
> > On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> >> On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> >>>
> >>> 2. Because localities are not too useful these days given TPM2's
> >>>     policy mechanism
> >>
> >> Localitites are useful to the TPM2 policy mechanism.  When we get key
> >> policy in the kernel it will give us a way to create TPM wrapped keys
> >> that can only be unwrapped in the kernel if we run the kernel in a
> >> different locality from userspace (I already have demo patches doing
> >> this).
> > 
> > Let's keep this discussion in scope, please.
> > 
> > Removing useless code using registers that you might have some actually
> > useful use is not wrong thing to do. It is better to look at things from
> > clean slate when the time comes.
> > 
> >>>   I cannot recall out of top of my head can
> >>>     you have two localities open at same time.
> >>
> >> I think there's a misunderstanding about what localities are: they're
> >> effectively an additional platform supplied tag to a command.  Each
> >> command can therefore have one and only one locality.  The TPM doesn't
> > 
> > Actually this was not unclear at all. I even read the chapters from
> > Ariel Segall's yesterday as a refresher.
> > 
> > I was merely asking that if TPM_ACCESS_X is not properly cleared and you
> > se TPM_ACCESS_Y where Y < X how does the hardware react as the bug
> > report is pretty open ended and not very clear of the steps leading to
> > unwanted results.
> > 
> > With a quick check from [1] could not spot the conflict reaction but
> > it is probably there.
>
> The expected behavior is explained in the Informative Comment of section 
> 6.5.2.4 of the Client PTP spec[1]:
>
> "The purpose of this register is to allow the processes operating at the 
> various localities to share the TPM. The basic notion is that any 
> locality can request access to the TPM by setting the 
> TPM_ACCESS_x.requestUse field using its assigned TPM_ACCESS_x register 
> address. If there is no currently set locality, the TPM sets current 
> locality to the requesting one and allows operations only from that 
> locality. If the TPM is currently at another locality, the TPM keeps the 
> request pending until the currently executing locality frees the TPM. 

Right.

I'd think it would make sense to document the basic dance like this as
part of kdoc for request_locality:

* Setting TPM_ACCESS_x.requestUse:
*  1. No locality reserved => set locality.
*  2. Locality reserved => set pending.

I.e. easy reminder with enough granularity.

> Software relinquishes the TPM’s locality by writing a 1 to the 
> TPM_ACCESS_x.activeLocality field. Upon release, the TPM honors the 
> highest locality request pending. If there is no pending request, the 
> TPM enters the “free” state."

And this for relinquish_locality:

* Setting TPM_ACCESS_x.activeLocality:
*  1. No locality pending => free.
*  2. Localities pending => reserve for highest.

> >> submission).   I think the locality request/relinquish was modelled
> >> after some other HW, but I don't know what.
> > 
> > My wild guess: first implementation was made when TPM's became available
> > and there was no analytical thinking other than getting something that
> > runs :-)
>
> Actually, no that is not how it was done. IIRC, localities were designed 
> in conjunction with D-RTM when Intel and MS started the LeGrande effort 
> back in 2000. It was then generalized for the TPM 1.1b specification. My 

OK, thanks for this bit of information! I did not know this.

> first introduction to LeGrande/TXT wasn't until 2005 as part of an early 
> access program. So most of my historical understanding is from 
> discussions I luckily got to have with one of the architects and a few 
> of the original TCG committee members.

Thanks alot for sharing this.

>
> [1] 
> https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf
>
> v/r,
> dps


BR, Jarkko
Jarkko Sakkinen Feb. 23, 2024, 8:42 p.m. UTC | #31
On Fri Feb 23, 2024 at 10:40 PM EET, Jarkko Sakkinen wrote:
> On Fri Feb 23, 2024 at 3:57 AM EET, Daniel P. Smith wrote:
> > On 2/21/24 14:43, Jarkko Sakkinen wrote:
> > > On Wed Feb 21, 2024 at 12:37 PM UTC, James Bottomley wrote:
> > >> On Tue, 2024-02-20 at 22:31 +0000, Jarkko Sakkinen wrote:
> > >>>
> > >>> 2. Because localities are not too useful these days given TPM2's
> > >>>     policy mechanism
> > >>
> > >> Localitites are useful to the TPM2 policy mechanism.  When we get key
> > >> policy in the kernel it will give us a way to create TPM wrapped keys
> > >> that can only be unwrapped in the kernel if we run the kernel in a
> > >> different locality from userspace (I already have demo patches doing
> > >> this).
> > > 
> > > Let's keep this discussion in scope, please.
> > > 
> > > Removing useless code using registers that you might have some actually
> > > useful use is not wrong thing to do. It is better to look at things from
> > > clean slate when the time comes.
> > > 
> > >>>   I cannot recall out of top of my head can
> > >>>     you have two localities open at same time.
> > >>
> > >> I think there's a misunderstanding about what localities are: they're
> > >> effectively an additional platform supplied tag to a command.  Each
> > >> command can therefore have one and only one locality.  The TPM doesn't
> > > 
> > > Actually this was not unclear at all. I even read the chapters from
> > > Ariel Segall's yesterday as a refresher.
> > > 
> > > I was merely asking that if TPM_ACCESS_X is not properly cleared and you
> > > se TPM_ACCESS_Y where Y < X how does the hardware react as the bug
> > > report is pretty open ended and not very clear of the steps leading to
> > > unwanted results.
> > > 
> > > With a quick check from [1] could not spot the conflict reaction but
> > > it is probably there.
> >
> > The expected behavior is explained in the Informative Comment of section 
> > 6.5.2.4 of the Client PTP spec[1]:
> >
> > "The purpose of this register is to allow the processes operating at the 
> > various localities to share the TPM. The basic notion is that any 
> > locality can request access to the TPM by setting the 
> > TPM_ACCESS_x.requestUse field using its assigned TPM_ACCESS_x register 
> > address. If there is no currently set locality, the TPM sets current 
> > locality to the requesting one and allows operations only from that 
> > locality. If the TPM is currently at another locality, the TPM keeps the 
> > request pending until the currently executing locality frees the TPM. 
>
> Right.
>
> I'd think it would make sense to document the basic dance like this as
> part of kdoc for request_locality:
>
> * Setting TPM_ACCESS_x.requestUse:
> *  1. No locality reserved => set locality.
> *  2. Locality reserved => set pending.
>
> I.e. easy reminder with enough granularity.

Also for any non-TPM kernel developer this should be enough to get the
basic gist of the mechanism without spending too much time reading.

BR, Jarkko
Jarkko Sakkinen Feb. 23, 2024, 8:44 p.m. UTC | #32
On Fri Feb 23, 2024 at 3:56 AM EET, Daniel P. Smith wrote:
> On 2/20/24 15:54, Lino Sanfilippo wrote:
> > Hi,
> > 
> > On 20.02.24 19:42, Alexander Steffen wrote:
> >> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> >>
> >>
> >> On 02.02.2024 04:08, Lino Sanfilippo wrote:
> >>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
> >>>
> >>>>
> >>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
> >>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
> >>>>> is indiscriminately decremented. Thus creating a situation for an integer
> >>>>> underflow of the counter.
> >>>>
> >>>> What is the sequence of events that leads to this triggering the
> >>>> underflow? This information should be represent in the commit message.
> >>>>
> >>>
> >>> AFAIU this is:
> >>>
> >>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
> >>> for the first time, but since a locality is (unexpectedly) already active
> >>> check_locality() and consequently __tpm_tis_request_locality() return "true".
> >>
> >> check_locality() returns true, but __tpm_tis_request_locality() returns
> >> the requested locality. Currently, this is always 0, so the check for
> >> !ret will always correctly indicate success and increment the
> >> locality_count.
> >>
> > 
> > Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
> > be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
> > passed from one function to another.
> > But this is rather code optimization and not really required to fix the reported bug.
>
> Actually, doing so will break the TPM API. The function 
> tpm_tis_request_locality() is registered as the locality handler, 
>   int (*request_locality)(struct tpm_chip *chip, int loc), in the tis 
> instance of struct tpm_class_ops{}. This is the API used by the Secure 
> Launch series to open Locality2 for the measurements it must record.

OK, based on James' earlier feedback on possibility to have kernel
specific locality and this , and some reconsideration of my position on
the topic, and reading all these great and informative responses, I
think I went too far with this :-)

>
> v/r,
> dps

BR, Jarkko
Jarkko Sakkinen Feb. 23, 2024, 8:50 p.m. UTC | #33
On Fri Feb 23, 2024 at 3:57 AM EET, Daniel P. Smith wrote:
> On 2/20/24 17:31, Jarkko Sakkinen wrote:
> > On Tue Feb 20, 2024 at 10:26 PM UTC, Jarkko Sakkinen wrote:
> >> On Tue Feb 20, 2024 at 8:54 PM UTC, Lino Sanfilippo wrote:
> >>> for (i = 0; i <= MAX_LOCALITY; i++)
> >>> 	__tpm_tis_relinquish_locality(priv, i);
> >>
> >> I'm pretty unfamiliar with Intel TXT so asking a dummy question:
> >> if Intel TXT uses locality 2 I suppose we should not try to
> >> relinquish it, or?
> >>
> >> AFAIK, we don't have a symbol called MAX_LOCALITY.
> > 
> > OK it was called TPM_MAX_LOCALITY :-) I had the patch set applied
> > in one branch but looked up with wrong symbol name.
> > 
> > So I reformalize my question to two parts:
> > 
> > 1. Why does TXT leave locality 2 open in the first place? I did
> >     not see explanation. Isn't this a bug in TXT?
>
> It does so because that is what the TCG D-RTM specification requires. 
> See Section 5.3.4.10 of the TCG D-RTM specification[1], the first 
> requirement is, "The DLME SHALL receive control with access to Locality 2."

From below also the locality enumeration would be good to have
documented (as a reminder).

>
> > 2. Because localities are not too useful these days given TPM2's
> >     policy mechanism I cannot recall out of top of my head can
> >     you have two localities open at same time. So what kind of
> >     conflict happens when you try to open locality 0 and have
> >     locality 2 open?
>
> I would disagree and would call your attention to the TCG's 
> definition/motivation for localities, Section 3.2 of Client PTP 
> specification[2].
>
> "“Locality” is an assertion to the TPM that a command’s source is 
> associated with a particular component. Locality can be thought of as a 
> hardware-based authorization. The TPM is not actually aware of the 
> nature of the relationship between the locality and the component. The 
> ability to reset and extend notwithstanding, it is important to note 
> that, from a PCR “usage” perspective, there is no hierarchical 
> relationship between different localities. The TPM simply enforces 
> locality restrictions on TPM assets (such as PCR or SEALed blobs)."
>
> As stated, from the TPM specification perspective, it is not aware of 
> this mapping to components and leaves it to the platform to enforce.

Yeah, TPM is a passive component, not active actor, in everything.

The way I see locality as way to separate e.g. kernel and user space
driver TPM transactions is pretty much like actor-dependent salt
(e.g. if 0 was for user space and 1 was for kernel).

>
> "The protection and separation of the localities (and therefore the 
> association with the associated components) is entirely the 
> responsibility of the platform components. Platform components, 
> including the OS, may provide the separation of localities using 
> protection mechanisms such as virtual memory or paging."
>
> The x86 manufactures opted to adopt the D-RTM specification which 
> defines the components as follows:
>
> Locality 4: Usually associated with the CPU executing microcode. This is 
> used to establish the Dynamic RTM.
> Locality 3: Auxiliary components. Use of this is optional and, if used, 
> it is implementation dependent.
> Locality 2: Dynamically Launched OS (Dynamic OS) “runtime” environment.
> Locality 1: An environment for use by the Dynamic OS.
> Locality 0: The Static RTM, its chain of trust and its environment.
>
> And the means to protect and separate those localities are encoded in 
> the x86 chipset, i.e A D-RTM Event must be used to access any of the 
> D-RTM Localities (Locality1 - Locality4).
>
> For Intel, Locality 4 can only be accessed when a dedicated signal 
> between the CPU and the chipset is raised, thus only allowing the CPU to 
> utilize Locality 4. The CPU will then close Locality 4, authenticate and 
> give control to the ACM with access to Locality 3. When the ACM is 
> complete, it will instruct the chipset to lock Locality 3 and give 
> control to the DLME (MLE in Intel parlance) with Locality 2 open. It is 
> up to the DLME, the Linux kernel in this case, to decide how to assign 
> components to Locality 1 and 2.
>
> As to proposals to utilize localities by the Linux kernel, the only one 
> I was aware of was dropped because they couldn't open the higher localities.
>
> I would also highlight that the D-RTM implementation guide for Arm 
> allows for a hardware D-RTM event, which the vendor may choose to 
> implement a hardware/CPU enforced access to TPM localities. Thus, the 
> ability to support localities will also become a requirement for certain 
> Arm CPUs.
>
> [1] 
> https://trustedcomputinggroup.org/wp-content/uploads/TCG_D-RTM_Architecture_v1-0_Published_06172013.pdf
> [2] 
> https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf

BR, Jarkko
Lino Sanfilippo Feb. 24, 2024, 2:06 a.m. UTC | #34
Hi,

On 20.02.24 19:42, Alexander Steffen wrote:
> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>
>>>
>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>> Commit 933bfc5ad213 introduced the use of a locality counter to control when a
>>>> locality request is allowed to be sent to the TPM. In the commit, the counter
>>>> is indiscriminately decremented. Thus creating a situation for an integer
>>>> underflow of the counter.
>>>
>>> What is the sequence of events that leads to this triggering the
>>> underflow? This information should be represent in the commit message.
>>>
>>
>> AFAIU this is:
>>
>> 1. We start with a locality_counter of 0 and then we call tpm_tis_request_locality()
>> for the first time, but since a locality is (unexpectedly) already active
>> check_locality() and consequently __tpm_tis_request_locality() return "true".
>
> check_locality() returns true, but __tpm_tis_request_locality() returns the requested locality. Currently, this is always 0, so the check for !ret will always correctly indicate success and increment the locality_count.

So how was the reported underflow triggered then? We only request locality 0 in TPM TIS core, no other locality.

>
> But since theoretically a locality != 0 could be requested, the correct fix would be to check for something like ret >= 0 or ret == l instead of !ret.

I thought that the underflow issue has actually been triggered and is not only a theoretical problem.
But now I wonder how this could ever happen.

BR,
Lino
Lino Sanfilippo Feb. 24, 2024, 2:34 a.m. UTC | #35
On 23.02.24 02:56, Daniel P. Smith wrote:

>>
>> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
>> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
>> passed from one function to another.
>> But this is rather code optimization and not really required to fix the reported bug.
> 
> Actually, doing so will break the TPM API. The function
> tpm_tis_request_locality() is registered as the locality handler,
>  int (*request_locality)(struct tpm_chip *chip, int loc), in the tis
> instance of struct tpm_class_ops{}. This is the API used by the Secure
> Launch series to open Locality2 for the measurements it must record.
> 

I dont understand this. How do you use locality 2 with the current mainline
API? Do you adjust the mainline code to use locality 2 instead of 0? This would 
at least explain how you ran into the underflow issue which from
the source code seems to be impossible when using locality 0. But then I wonder why
this has not been made clear in this discussion. And then we are talking
about fixing a bug that does not even exist in the upstream code. 


BR,
Lino
Daniel P. Smith Feb. 25, 2024, 11:23 a.m. UTC | #36
On 2/23/24 07:58, Jarkko Sakkinen wrote:
> On Fri Feb 23, 2024 at 3:58 AM EET, Daniel P. Smith wrote:
>>> Just adding here that I wish we also had a log transcript of bug, which
>>> is right now missing. The explanation believable enough to move forward
>>> but I still wish to see a log transcript.
>>
>> That will be forth coming.
> 
> I did not respond yet to other responses that you've given in the past
> 12'ish hours or so (just woke up) but I started to think how all this
> great and useful information would be best kept in memory. Some of it
> has been discussed in the past but there is lot of small details that
> are too easily forgotten.
> 
> I'd think the best "documentation" approach here would be inject the
> spec references to the sites where locality behaviour is changed so
> that it is easy in future cross-reference them, and least of risk
> of having code changes that would break anything. I think this way
> all the information that you provided is best preserved for the
> future.
> 
> Thanks a lot for great and informative responses!

No problem at all.

Here is a serial output[1] from a dynamic launch using Linux Secure 
Launch v7[2] with one additional patch[3] to dump TPM driver state.

[1] https://paste.debian.net/1308538/
[2] 
https://lore.kernel.org/lkml/scpu273f2mprr2uvjlyk2xrjjtcduhse2eb45lmj7givn6jh4u@i2v4f2vbldu4/T/
[3] https://paste.debian.net/1308539/
Jarkko Sakkinen Feb. 26, 2024, 9:38 a.m. UTC | #37
On Sat Feb 24, 2024 at 4:34 AM EET, Lino Sanfilippo wrote:
>
>
> On 23.02.24 02:56, Daniel P. Smith wrote:
>
> >>
> >> Will the TPM TIS CORE ever (have to) request another locality than 0? Maybe the best would
> >> be to hardcode TPM_ACCESS(0) and get rid of all the locality parameters that are
> >> passed from one function to another.
> >> But this is rather code optimization and not really required to fix the reported bug.
> > 
> > Actually, doing so will break the TPM API. The function
> > tpm_tis_request_locality() is registered as the locality handler,
> >  int (*request_locality)(struct tpm_chip *chip, int loc), in the tis
> > instance of struct tpm_class_ops{}. This is the API used by the Secure
> > Launch series to open Locality2 for the measurements it must record.
> > 
>
> I dont understand this. How do you use locality 2 with the current mainline
> API? Do you adjust the mainline code to use locality 2 instead of 0? This would 
> at least explain how you ran into the underflow issue which from
> the source code seems to be impossible when using locality 0. But then I wonder why
> this has not been made clear in this discussion. And then we are talking
> about fixing a bug that does not even exist in the upstream code. 

Thanks for bringing this up, now I finally figured out what confuses me
in this series.

Daniel, I also have troubles understanding why locality_count would ever
be greater than zero exactly in the mainline kernel, *without* [1]?

[1] https://lore.kernel.org/linux-integrity/20240214221847.2066632-1-ross.philipson@oracle.com/

BR, Jarkko
Jarkko Sakkinen Feb. 26, 2024, 9:39 a.m. UTC | #38
On Sun Feb 25, 2024 at 1:23 PM EET, Daniel P. Smith wrote:
> On 2/23/24 07:58, Jarkko Sakkinen wrote:
> > On Fri Feb 23, 2024 at 3:58 AM EET, Daniel P. Smith wrote:
> >>> Just adding here that I wish we also had a log transcript of bug, which
> >>> is right now missing. The explanation believable enough to move forward
> >>> but I still wish to see a log transcript.
> >>
> >> That will be forth coming.
> > 
> > I did not respond yet to other responses that you've given in the past
> > 12'ish hours or so (just woke up) but I started to think how all this
> > great and useful information would be best kept in memory. Some of it
> > has been discussed in the past but there is lot of small details that
> > are too easily forgotten.
> > 
> > I'd think the best "documentation" approach here would be inject the
> > spec references to the sites where locality behaviour is changed so
> > that it is easy in future cross-reference them, and least of risk
> > of having code changes that would break anything. I think this way
> > all the information that you provided is best preserved for the
> > future.
> > 
> > Thanks a lot for great and informative responses!
>
> No problem at all.
>
> Here is a serial output[1] from a dynamic launch using Linux Secure 
> Launch v7[2] with one additional patch[3] to dump TPM driver state.

But are this fixes for a kernel tree with [2] applied.

If the bugs do not occur in the mainline tree without the out-of-tree
feature, they are not bug fixes. They should then really be part of that
series.

BR, Jarkko
Alexander Steffen Feb. 26, 2024, 12:43 p.m. UTC | #39
On 23.02.2024 02:55, Daniel P. Smith wrote:
> On 2/20/24 13:42, Alexander Steffen wrote:
>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>
>>>>
>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to
>>>>> control when a
>>>>> locality request is allowed to be sent to the TPM. In the commit,
>>>>> the counter
>>>>> is indiscriminately decremented. Thus creating a situation for an
>>>>> integer
>>>>> underflow of the counter.
>>>>
>>>> What is the sequence of events that leads to this triggering the
>>>> underflow? This information should be represent in the commit message.
>>>>
>>>
>>> AFAIU this is:
>>>
>>> 1. We start with a locality_counter of 0 and then we call
>>> tpm_tis_request_locality()
>>> for the first time, but since a locality is (unexpectedly) already 
>>> active
>>> check_locality() and consequently __tpm_tis_request_locality() return
>>> "true".
>>
>> check_locality() returns true, but __tpm_tis_request_locality() returns
>> the requested locality. Currently, this is always 0, so the check for
>> !ret will always correctly indicate success and increment the
>> locality_count.
>>
>> But since theoretically a locality != 0 could be requested, the correct
>> fix would be to check for something like ret >= 0 or ret == l instead of
>> !ret. Then the counter will also be incremented correctly for localities
>> != 0, and no underflow will happen later on. Therefore, explicitly
>> checking for an underflow is unnecessary and hides the real problem.
>>
> 
> My apologies, but I will have to humbly disagree from a fundamental
> level here. If a state variable has bounds, then those bounds should be
> enforced when the variable is being manipulated.

That's fine, but that is not what your proposed fix does.

tpm_tis_request_locality and tpm_tis_relinquish_locality are meant to be 
called in pairs: for every (successful) call to tpm_tis_request_locality 
there *must* be a corresponding call to tpm_tis_relinquish_locality 
afterwards. Unfortunately, in C there is no language construct to 
enforce that (nothing like a Python context manager), so instead 
locality_count is used to count the number of successful calls to 
tpm_tis_request_locality, so that tpm_tis_relinquish_locality can wait 
to actually relinquish the locality until the last expected call has 
happened (you can think of that as a Python RLock, to stay with the 
Python analogies).

So if locality_count ever gets negative, that is certainly a bug 
somewhere. But your proposed fix hides this bug, by allowing 
tpm_tis_relinquish_locality to be called more often than 
tpm_tis_request_locality. You could have added something like 
BUG_ON(priv->locality_count == 0) before decrementing the counter. That 
would really enforce the bounds, without hiding the bug, and I would be 
fine with that.

Of course, that still leaves the actual bug to be fixed. In this case, 
there is no mismatch between the calls to tpm_tis_request_locality and 
tpm_tis_relinquish_locality. It is just (as I said before) that the 
counting of successful calls in tpm_tis_request_locality is broken for 
localities != 0, so that is what you need to fix.

> Assuming that every
> path leading to the variable manipulation code has ensured proper
> manipulation is just that, an assumption. When assumptions fail is how
> bugs and vulnerabilities occur.
> 
> To your point, does this full address the situation experienced, I would
> say it does not. IMHO, the situation is really a combination of both
> patch 1 and patch 2, but the request was to split the changes for
> individual discussion. We selected this one as being the fixes for two
> reasons. First, it blocks the underflow such that when the Secure Launch
> series opens Locality 2, it will get incremented at that time and the
> internal locality tracking state variables will end up with the correct
> values. Thus leading to the relinquish succeeding at kernel shutdown.
> Second, it provides a stronger defensive coding practice.
> 
> Another reason that this works as a fix is that the TPM specification
> requires the registers to be mirrored across all localities, regardless
> of the active locality. While all the request/relinquishes for Locality
> 0 sent by the early code do not succeed, obtaining the values via the
> Locality 0 registers are still guaranteed to be correct.
> 
> v/r,
> dps
diff mbox series

Patch

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 1b350412d8a6..4176d3bd1f04 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -180,7 +180,8 @@  static int tpm_tis_relinquish_locality(struct tpm_chip *chip, int l)
 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
 
 	mutex_lock(&priv->locality_count_mutex);
-	priv->locality_count--;
+	if (priv->locality_count > 0)
+		priv->locality_count--;
 	if (priv->locality_count == 0)
 		__tpm_tis_relinquish_locality(priv, l);
 	mutex_unlock(&priv->locality_count_mutex);