TPM selftest failure in 4.15
diff mbox

Message ID 1517498648.3145.4.camel@HansenPartnership.com
State New
Headers show

Commit Message

James Bottomley Feb. 1, 2018, 3:24 p.m. UTC
On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:
> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:
> > 
> > Dear James,
> > 
> > 
> > On 02/01/18 13:16, James Bottomley wrote:
> > > 
> > > 
> > > Embarrassingly enough, I'm just on my way to do a TPM talk at
> > > FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and
> > > this is what I got after I arrived this morning:
> > > 
> > > jejb@jarvis:~> dmesg | grep -i tpm
> > > [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034
> > > (v03        Tpm2Tabl 00000001 AMI  00000000)
> > > [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-
> > > id 
> > > 2)
> > > [    1.608863] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    1.640052] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    1.691215] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    1.782377] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    1.953539] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    2.284701] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    2.935743] tpm tpm0: A TPM error (2314) occurred continue
> > > selftest
> > > [    4.216236] tpm tpm0: TPM self test failed
> > > [    4.236829] ima: No TPM chip found, activating TPM-bypass!
> > > (rc=-
> > > 19)
> > > 
> > > The error is TPM_RC_TESTING, which means it looks like we don't
> > > wait long enough for the selftests to complete.  I get this all
> > > the time booting with 4.15.  Fortunately I have a 4.13 backup
> > > kernel which is fine (otherwise I'd be a bit hosed since all my
> > > keys now require a TPM).
> > > 
> > > I'll debug on the train; my current suspicion is that the
> > > TPM_LONG duration might be a bit short for this chip (A nuvoton
> > > 6xx in a dell XPS-13).
> > 
> > Please join the thread [1], where I reported the same problem for
> > the Dell XPS 13 9360. Unfortunately, no solution was found,
> > especially, as I did not use the TPM. Other owners of that system
> > unfortunately didn’t have time to report back if it work for them,
> > so the “conclusion” kind of was, that my TPM was broken, and had to
> > be tested.
> 
> OK, I'll try to find a fix.  It's clearly a marginal problem since
> I've booted most -rc kernels without issue, so there's some slight
> timing change in 4.15 that triggered it.  It could also be a shutdown
> issue.  Any NV ram stuff deferred to start up would take a variable
> amount of time.
> 
> You'd almost think it's some sort of TPM self protest: the more stuff
> I use it for the more problems it seems to create.  I'm definitely
> motivated to fix it because without a TPM I can't actually do much
> with my laptop.

OK, I investigated but now my TPM has returned to normal (as in it
passes the selftest immediately).  There's clearly something that makes
it return TPM_RC_TESTING to every self test probe for seconds at a
time, but I don't know what it is.  Sending a different command seems
to cause the problem to clear (Managed to reproduce once with the patch
to verify), so this is my proposed fix.  It's clearly nonsensical to
detach the driver because the self test still returns TPM_RC_TESTING,
so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a
warning message so we'll see it in the logs if it causes problems.
 Given that this seems to be some type of internal TPM issue, I don't
believe changing the timings would work.

James

---

Comments

Jason Gunthorpe Feb. 1, 2018, 5:40 p.m. UTC | #1
On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote:

> OK, I investigated but now my TPM has returned to normal (as in it
> passes the selftest immediately).  There's clearly something that makes
> it return TPM_RC_TESTING to every self test probe for seconds at a
> time, but I don't know what it is.  Sending a different command seems
> to cause the problem to clear (Managed to reproduce once with the patch
> to verify), so this is my proposed fix.  It's clearly nonsensical to
> detach the driver because the self test still returns TPM_RC_TESTING,
> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a
> warning message so we'll see it in the logs if it causes problems.
>  Given that this seems to be some type of internal TPM issue, I don't
> believe changing the timings would work.

But if a selftest returns TPM2_RC_TESTING I would expect the next
command to also fail with a testing in progress code? At least by the
spec..

The point of invoking selftest is to get to a state where future TPM
commands will succeed, so returning immediately on RC_TESTING seems
wrong?

Jason
James Bottomley Feb. 1, 2018, 6:46 p.m. UTC | #2
On Thu, 2018-02-01 at 10:40 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote:
> 
> > 
> > OK, I investigated but now my TPM has returned to normal (as in it
> > passes the selftest immediately).  There's clearly something that
> > makes it return TPM_RC_TESTING to every self test probe for seconds
> > at a time, but I don't know what it is.  Sending a different
> > command seems to cause the problem to clear (Managed to reproduce
> > once with the patch to verify), so this is my proposed fix.  It's
> > clearly nonsensical to detach the driver because the self test
> > still returns TPM_RC_TESTING, so convert that return to a
> > TPM_RC_SUCCESS on timeout.  It prints a warning message so we'll
> > see it in the logs if it causes problems.  Given that this seems to
> > be some type of internal TPM issue, I don't believe changing the
> > timings would work.
> 
> But if a selftest returns TPM2_RC_TESTING I would expect the next
> command to also fail with a testing in progress code? At least by the
> spec..

No, the spec says only that the command may fail with TPM_RC_TESTING
*if* the system it requires is under test.

I have no idea what's up with my TPM.  The selftests usually complete
in under 1ms and return TPM_RC_SUCCESS.  However occasionally the self
test can return TPM_RC_TESTING for seconds.

I've already booted with the patch and verified it doesn't induce any
failures for me.  Either the long running test is in a completely
unconnected subsystem or invoking a different command forces the TPM to
clear the testing state.

> The point of invoking selftest is to get to a state where future TPM
> commands will succeed, so returning immediately on RC_TESTING seems
> wrong?

Well it's definitely less wrong than the converse of actually detaching
the TPM driver because a self test is apparently still in progress.  If
you depend on the TPM for a critical function, your entire system is
hosed at that point.

I honestly don't think we should be waiting for the self test at all.
 We should kick it off and treat any TPM_RC_TESTING error as -EAGAIN.
 We're already under fire for slow boot sequences and adding 2s just to
wait for the TPM to self test adds to that for no real value.

James
Jason Gunthorpe Feb. 1, 2018, 6:59 p.m. UTC | #3
On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:

> I honestly don't think we should be waiting for the self test at all.
> We should kick it off and treat any TPM_RC_TESTING error as -EAGAIN.
> We're already under fire for slow boot sequences and adding 2s just to
> wait for the TPM to self test adds to that for no real value.

Arguably the BIOS should have completed the selftest - this stuff
generally only exists to support embedded.

I don't like the idea of EAGAIN, that just expose all our users to
this mess.

I would support making transmit_cmd genericly wait and retry if the
TPM insists we need to wait for selftest to complete the specific
command though.

Jason
Paul Menzel Feb. 1, 2018, 7:16 p.m. UTC | #4
Dear James,


Am 01.02.2018 um 16:24 schrieb James Bottomley:
> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:
>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:

>>> On 02/01/18 13:16, James Bottomley wrote:
>>>>
>>>>
>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at
>>>> FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and
>>>> this is what I got after I arrived this morning:
>>>>
>>>> jejb@jarvis:~> dmesg | grep -i tpm
>>>> [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034 (v03        Tpm2Tabl 00000001 AMI  00000000)
>>>> [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
>>>> [    1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>> [    4.216236] tpm tpm0: TPM self test failed
>>>> [    4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
>>>>
>>>> The error is TPM_RC_TESTING, which means it looks like we don't
>>>> wait long enough for the selftests to complete.  I get this all
>>>> the time booting with 4.15.  Fortunately I have a 4.13 backup
>>>> kernel which is fine (otherwise I'd be a bit hosed since all my
>>>> keys now require a TPM).
>>>>
>>>> I'll debug on the train; my current suspicion is that the
>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton
>>>> 6xx in a dell XPS-13).
>>>
>>> Please join the thread [1], where I reported the same problem for
>>> the Dell XPS 13 9360. Unfortunately, no solution was found,
>>> especially, as I did not use the TPM. Other owners of that system
>>> unfortunately didn’t have time to report back if it work for them,
>>> so the “conclusion” kind of was, that my TPM was broken, and had to
>>> be tested.
>>
>> OK, I'll try to find a fix.  It's clearly a marginal problem since
>> I've booted most -rc kernels without issue, so there's some slight
>> timing change in 4.15 that triggered it.  It could also be a shutdown
>> issue.  Any NV ram stuff deferred to start up would take a variable
>> amount of time.
>>
>> You'd almost think it's some sort of TPM self protest: the more stuff
>> I use it for the more problems it seems to create.  I'm definitely
>> motivated to fix it because without a TPM I can't actually do much
>> with my laptop.
> 
> OK, I investigated but now my TPM has returned to normal (as in it
> passes the selftest immediately).  There's clearly something that makes
> it return TPM_RC_TESTING to every self test probe for seconds at a
> time, but I don't know what it is.  Sending a different command seems
> to cause the problem to clear (Managed to reproduce once with the patch
> to verify), so this is my proposed fix.  It's clearly nonsensical to
> detach the driver because the self test still returns TPM_RC_TESTING,
> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a
> warning message so we'll see it in the logs if it causes problems.
>   Given that this seems to be some type of internal TPM issue, I don't
> believe changing the timings would work.

Maybe Mario can confirm this issue too, now that Linux 4.15 is released. 
Maybe he also has a way to get the Nuvoton people involved.

> ---
> 
> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
> index f40d20671a78..3e1b062d8888 100644
> --- a/drivers/char/tpm/tpm2-cmd.c
> +++ b/drivers/char/tpm/tpm2-cmd.c
> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip)
>   		/* wait longer the next round */
>   		delay_msec *= 2;
>   	}
> +	if (rc == TPM2_RC_TESTING) {
> +		/*
> +		 * A return of RC_TESTING means the TPM is still
> +		 * running self tests.  If one fails it will go into
> +		 * failure mode and return RC_FAILED to every command,
> +		 * so treat a still in testing return as a success
> +		 * rather than causing a driver detach.
> +		 */
> +		dev_err(&chip->dev,"TPM: Still in testing mode after %dms, continuing\n", delay_msec);
> +		rc = TPM2_RC_SUCCESS;
> +	}
>   
>   	return rc;
>   }

Alexander replied the following in the other thread. No idea if you read 
it yet.

> The list of "A TPM error (2314) occurred continue selftest" is caused by my 
> commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React correctly to
> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so the TPM
> tells us that self-tests are still running in the background. This problem was
> not visible in previous versions, since it (incorrectly) ignored > TPM_RC_TESTING.

Maybe the commit should be reverted for now until this has cleared up 
for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy.


Kind regards,

Paul


PS: Alexander will also be at FOSDEM and mentioned your talk [2].


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125a2210541079e8e7c69e629ad06cabed788f8[2] 
https://lists.01.org/pipermail/tpm2/2018-January/000486.html
Paul Menzel Feb. 1, 2018, 7:17 p.m. UTC | #5
[resend with regressions@ address fixed, sorry]

Am 01.02.2018 um 20:16 schrieb Paul Menzel:
> Dear James,
> 
> 
> Am 01.02.2018 um 16:24 schrieb James Bottomley:
>> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:
>>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:
> 
>>>> On 02/01/18 13:16, James Bottomley wrote:
>>>>>
>>>>>
>>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at
>>>>> FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and
>>>>> this is what I got after I arrived this morning:
>>>>>
>>>>> jejb@jarvis:~> dmesg | grep -i tpm
>>>>> [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034 
>>>>> (v03        Tpm2Tabl 00000001 AMI  00000000)
>>>>> [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
>>>>> [    1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest
>>>>> [    4.216236] tpm tpm0: TPM self test failed
>>>>> [    4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
>>>>>
>>>>> The error is TPM_RC_TESTING, which means it looks like we don't
>>>>> wait long enough for the selftests to complete.  I get this all
>>>>> the time booting with 4.15.  Fortunately I have a 4.13 backup
>>>>> kernel which is fine (otherwise I'd be a bit hosed since all my
>>>>> keys now require a TPM).
>>>>>
>>>>> I'll debug on the train; my current suspicion is that the
>>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton
>>>>> 6xx in a dell XPS-13).
>>>>
>>>> Please join the thread [1], where I reported the same problem for
>>>> the Dell XPS 13 9360. Unfortunately, no solution was found,
>>>> especially, as I did not use the TPM. Other owners of that system
>>>> unfortunately didn’t have time to report back if it work for them,
>>>> so the “conclusion” kind of was, that my TPM was broken, and had to
>>>> be tested.
>>>
>>> OK, I'll try to find a fix.  It's clearly a marginal problem since
>>> I've booted most -rc kernels without issue, so there's some slight
>>> timing change in 4.15 that triggered it.  It could also be a shutdown
>>> issue.  Any NV ram stuff deferred to start up would take a variable
>>> amount of time.
>>>
>>> You'd almost think it's some sort of TPM self protest: the more stuff
>>> I use it for the more problems it seems to create.  I'm definitely
>>> motivated to fix it because without a TPM I can't actually do much
>>> with my laptop.
>>
>> OK, I investigated but now my TPM has returned to normal (as in it
>> passes the selftest immediately).  There's clearly something that makes
>> it return TPM_RC_TESTING to every self test probe for seconds at a
>> time, but I don't know what it is.  Sending a different command seems
>> to cause the problem to clear (Managed to reproduce once with the patch
>> to verify), so this is my proposed fix.  It's clearly nonsensical to
>> detach the driver because the self test still returns TPM_RC_TESTING,
>> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a
>> warning message so we'll see it in the logs if it causes problems.
>>   Given that this seems to be some type of internal TPM issue, I don't
>> believe changing the timings would work.
> 
> Maybe Mario can confirm this issue too, now that Linux 4.15 is released. 
> Maybe he also has a way to get the Nuvoton people involved.
> 
>> ---
>>
>> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
>> index f40d20671a78..3e1b062d8888 100644
>> --- a/drivers/char/tpm/tpm2-cmd.c
>> +++ b/drivers/char/tpm/tpm2-cmd.c
>> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip)
>>           /* wait longer the next round */
>>           delay_msec *= 2;
>>       }
>> +    if (rc == TPM2_RC_TESTING) {
>> +        /*
>> +         * A return of RC_TESTING means the TPM is still
>> +         * running self tests.  If one fails it will go into
>> +         * failure mode and return RC_FAILED to every command,
>> +         * so treat a still in testing return as a success
>> +         * rather than causing a driver detach.
>> +         */
>> +        dev_err(&chip->dev,"TPM: Still in testing mode after %dms, 
>> continuing\n", delay_msec);
>> +        rc = TPM2_RC_SUCCESS;
>> +    }
>>       return rc;
>>   }
> 
> Alexander replied the following in the other thread. No idea if you read 
> it yet.
> 
>> The list of "A TPM error (2314) occurred continue selftest" is caused 
>> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React 
>> correctly to
>> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so 
>> the TPM
>> tells us that self-tests are still running in the background. This 
>> problem was
>> not visible in previous versions, since it (incorrectly) ignored > 
>> TPM_RC_TESTING.
> 
> Maybe the commit should be reverted for now until this has cleared up 
> for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy.
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> PS: Alexander will also be at FOSDEM and mentioned your talk [2].
> 
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125a2210541079e8e7c69e629ad06cabed788f8[2] 
> https://lists.01.org/pipermail/tpm2/2018-January/000486.html
Mario Limonciello Feb. 1, 2018, 8:12 p.m. UTC | #6
> -----Original Message-----

> From: Paul Menzel [mailto:pmenzel@molgen.mpg.de]

> Sent: Thursday, February 1, 2018 1:17 PM

> To: James Bottomley <James.Bottomley@HansenPartnership.com>

> Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello, Mario

> <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander Steffen

> <Alexander.Steffen@infineon.com>

> Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx)

> 

> [resend with regressions@ address fixed, sorry]

> 

> Am 01.02.2018 um 20:16 schrieb Paul Menzel:

> > Dear James,

> >

> >

> > Am 01.02.2018 um 16:24 schrieb James Bottomley:

> >> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:

> >>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:

> >

> >>>> On 02/01/18 13:16, James Bottomley wrote:

> >>>>>

> >>>>>

> >>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at

> >>>>> FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and

> >>>>> this is what I got after I arrived this morning:

> >>>>>

> >>>>> jejb@jarvis:~> dmesg | grep -i tpm

> >>>>> [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034

> >>>>> (v03        Tpm2Tabl 00000001 AMI  00000000)

> >>>>> [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)

> >>>>> [    1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest

> >>>>> [    4.216236] tpm tpm0: TPM self test failed

> >>>>> [    4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

> >>>>>

> >>>>> The error is TPM_RC_TESTING, which means it looks like we don't

> >>>>> wait long enough for the selftests to complete.  I get this all

> >>>>> the time booting with 4.15.  Fortunately I have a 4.13 backup

> >>>>> kernel which is fine (otherwise I'd be a bit hosed since all my

> >>>>> keys now require a TPM).

> >>>>>

> >>>>> I'll debug on the train; my current suspicion is that the

> >>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton

> >>>>> 6xx in a dell XPS-13).

> >>>>

> >>>> Please join the thread [1], where I reported the same problem for

> >>>> the Dell XPS 13 9360. Unfortunately, no solution was found,

> >>>> especially, as I did not use the TPM. Other owners of that system

> >>>> unfortunately didn’t have time to report back if it work for them,

> >>>> so the “conclusion” kind of was, that my TPM was broken, and had to

> >>>> be tested.

> >>>

> >>> OK, I'll try to find a fix.  It's clearly a marginal problem since

> >>> I've booted most -rc kernels without issue, so there's some slight

> >>> timing change in 4.15 that triggered it.  It could also be a shutdown

> >>> issue.  Any NV ram stuff deferred to start up would take a variable

> >>> amount of time.

> >>>

> >>> You'd almost think it's some sort of TPM self protest: the more stuff

> >>> I use it for the more problems it seems to create.  I'm definitely

> >>> motivated to fix it because without a TPM I can't actually do much

> >>> with my laptop.

> >>

> >> OK, I investigated but now my TPM has returned to normal (as in it

> >> passes the selftest immediately).  There's clearly something that makes

> >> it return TPM_RC_TESTING to every self test probe for seconds at a

> >> time, but I don't know what it is.  Sending a different command seems

> >> to cause the problem to clear (Managed to reproduce once with the patch

> >> to verify), so this is my proposed fix.  It's clearly nonsensical to

> >> detach the driver because the self test still returns TPM_RC_TESTING,

> >> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a

> >> warning message so we'll see it in the logs if it causes problems.

> >>   Given that this seems to be some type of internal TPM issue, I don't

> >> believe changing the timings would work.

> >

> > Maybe Mario can confirm this issue too, now that Linux 4.15 is released.

> > Maybe he also has a way to get the Nuvoton people involved.


James,

Did you actually experiment with changing the timings?

I was told that TPMs that are FIPS validated (such as that in the XPS 13) may 
take longer for the self tests to run.

> >

> >> ---

> >>

> >> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c

> >> index f40d20671a78..3e1b062d8888 100644

> >> --- a/drivers/char/tpm/tpm2-cmd.c

> >> +++ b/drivers/char/tpm/tpm2-cmd.c

> >> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip)

> >>           /* wait longer the next round */

> >>           delay_msec *= 2;

> >>       }

> >> +    if (rc == TPM2_RC_TESTING) {

> >> +        /*

> >> +         * A return of RC_TESTING means the TPM is still

> >> +         * running self tests.  If one fails it will go into

> >> +         * failure mode and return RC_FAILED to every command,

> >> +         * so treat a still in testing return as a success

> >> +         * rather than causing a driver detach.

> >> +         */

> >> +        dev_err(&chip->dev,"TPM: Still in testing mode after %dms,

> >> continuing\n", delay_msec);

> >> +        rc = TPM2_RC_SUCCESS;

> >> +    }

> >>       return rc;

> >>   }

> >

> > Alexander replied the following in the other thread. No idea if you read

> > it yet.

> >

> >> The list of "A TPM error (2314) occurred continue selftest" is caused

> >> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React

> >> correctly to

> >> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so

> >> the TPM

> >> tells us that self-tests are still running in the background. This

> >> problem was

> >> not visible in previous versions, since it (incorrectly) ignored >

> >> TPM_RC_TESTING.

> >

> > Maybe the commit should be reverted for now until this has cleared up

> > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy.

> >

> >

> > Kind regards,

> >

> > Paul

> >

> >

> > PS: Alexander will also be at FOSDEM and mentioned your talk [2].

> >

> >

> > [1]

> >

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125

> a2210541079e8e7c69e629ad06cabed788f8[2]

> > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
Mario Limonciello Feb. 1, 2018, 9:06 p.m. UTC | #7
> -----Original Message-----

> From: Limonciello, Mario

> Sent: Thursday, February 1, 2018 2:12 PM

> To: 'Paul Menzel' <pmenzel@molgen.mpg.de>; James Bottomley

> <James.Bottomley@HansenPartnership.com>

> Cc: linux-integrity <linux-integrity@vger.kernel.org>; regressions@leemhuis.info;

> Alexander Steffen <Alexander.Steffen@infineon.com>

> Subject: RE: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx)

> 

> 

> 

> > -----Original Message-----

> > From: Paul Menzel [mailto:pmenzel@molgen.mpg.de]

> > Sent: Thursday, February 1, 2018 1:17 PM

> > To: James Bottomley <James.Bottomley@HansenPartnership.com>

> > Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello, Mario

> > <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander Steffen

> > <Alexander.Steffen@infineon.com>

> > Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton 6xx)

> >

> > [resend with regressions@ address fixed, sorry]

> >

> > Am 01.02.2018 um 20:16 schrieb Paul Menzel:

> > > Dear James,

> > >

> > >

> > > Am 01.02.2018 um 16:24 schrieb James Bottomley:

> > >> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:

> > >>> On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:

> > >

> > >>>> On 02/01/18 13:16, James Bottomley wrote:

> > >>>>>

> > >>>>>

> > >>>>> Embarrassingly enough, I'm just on my way to do a TPM talk at

> > >>>>> FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and

> > >>>>> this is what I got after I arrived this morning:

> > >>>>>

> > >>>>> jejb@jarvis:~> dmesg | grep -i tpm

> > >>>>> [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034

> > >>>>> (v03        Tpm2Tabl 00000001 AMI  00000000)

> > >>>>> [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)

> > >>>>> [    1.608863] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    1.640052] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    1.691215] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    1.782377] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    1.953539] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    2.284701] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    2.935743] tpm tpm0: A TPM error (2314) occurred continue selftest

> > >>>>> [    4.216236] tpm tpm0: TPM self test failed

> > >>>>> [    4.236829] ima: No TPM chip found, activating TPM-bypass! (rc=-19)

> > >>>>>

> > >>>>> The error is TPM_RC_TESTING, which means it looks like we don't

> > >>>>> wait long enough for the selftests to complete.  I get this all

> > >>>>> the time booting with 4.15.  Fortunately I have a 4.13 backup

> > >>>>> kernel which is fine (otherwise I'd be a bit hosed since all my

> > >>>>> keys now require a TPM).

> > >>>>>

> > >>>>> I'll debug on the train; my current suspicion is that the

> > >>>>> TPM_LONG duration might be a bit short for this chip (A nuvoton

> > >>>>> 6xx in a dell XPS-13).

> > >>>>

> > >>>> Please join the thread [1], where I reported the same problem for

> > >>>> the Dell XPS 13 9360. Unfortunately, no solution was found,

> > >>>> especially, as I did not use the TPM. Other owners of that system

> > >>>> unfortunately didn’t have time to report back if it work for them,

> > >>>> so the “conclusion” kind of was, that my TPM was broken, and had to

> > >>>> be tested.

> > >>>

> > >>> OK, I'll try to find a fix.  It's clearly a marginal problem since

> > >>> I've booted most -rc kernels without issue, so there's some slight

> > >>> timing change in 4.15 that triggered it.  It could also be a shutdown

> > >>> issue.  Any NV ram stuff deferred to start up would take a variable

> > >>> amount of time.

> > >>>

> > >>> You'd almost think it's some sort of TPM self protest: the more stuff

> > >>> I use it for the more problems it seems to create.  I'm definitely

> > >>> motivated to fix it because without a TPM I can't actually do much

> > >>> with my laptop.

> > >>

> > >> OK, I investigated but now my TPM has returned to normal (as in it

> > >> passes the selftest immediately).  There's clearly something that makes

> > >> it return TPM_RC_TESTING to every self test probe for seconds at a

> > >> time, but I don't know what it is.  Sending a different command seems

> > >> to cause the problem to clear (Managed to reproduce once with the patch

> > >> to verify), so this is my proposed fix.  It's clearly nonsensical to

> > >> detach the driver because the self test still returns TPM_RC_TESTING,

> > >> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a

> > >> warning message so we'll see it in the logs if it causes problems.

> > >>   Given that this seems to be some type of internal TPM issue, I don't

> > >> believe changing the timings would work.

> > >

> > > Maybe Mario can confirm this issue too, now that Linux 4.15 is released.

> > > Maybe he also has a way to get the Nuvoton people involved.

> 

> James,

> 

> Did you actually experiment with changing the timings?

> 

> I was told that TPMs that are FIPS validated (such as that in the XPS 13) may

> take longer for the self tests to run.

> 

> > >

> > >> ---

> > >>

> > >> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c

> > >> index f40d20671a78..3e1b062d8888 100644

> > >> --- a/drivers/char/tpm/tpm2-cmd.c

> > >> +++ b/drivers/char/tpm/tpm2-cmd.c

> > >> @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct tpm_chip *chip)

> > >>           /* wait longer the next round */

> > >>           delay_msec *= 2;

> > >>       }

> > >> +    if (rc == TPM2_RC_TESTING) {

> > >> +        /*

> > >> +         * A return of RC_TESTING means the TPM is still

> > >> +         * running self tests.  If one fails it will go into

> > >> +         * failure mode and return RC_FAILED to every command,

> > >> +         * so treat a still in testing return as a success

> > >> +         * rather than causing a driver detach.

> > >> +         */

> > >> +        dev_err(&chip->dev,"TPM: Still in testing mode after %dms,

> > >> continuing\n", delay_msec);

> > >> +        rc = TPM2_RC_SUCCESS;

> > >> +    }

> > >>       return rc;

> > >>   }

> > >


I discussed this with some folks and although it would fix the problem it is not
accurately characterizing the situation.  What is likely happening here is that
issuing the self test command in succession is causing the TPM to restart the
self test and not complete.  Instead the selfTestDone bit should be polled.

I feel Paul is right, if a solution can't be brought up to do this instead, this 
commit 125a2210541079e8e7c69e629ad06cabed788f8c should be reverted.

> > > Alexander replied the following in the other thread. No idea if you read

> > > it yet.

> > >

> > >> The list of "A TPM error (2314) occurred continue selftest" is caused

> > >> by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React

> > >> correctly to

> > >> RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so

> > >> the TPM

> > >> tells us that self-tests are still running in the background. This

> > >> problem was

> > >> not visible in previous versions, since it (incorrectly) ignored >

> > >> TPM_RC_TESTING.

> > >

> > > Maybe the commit should be reverted for now until this has cleared up

> > > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy.

> > >

> > >

> > > Kind regards,

> > >

> > > Paul

> > >

> > >

> > > PS: Alexander will also be at FOSDEM and mentioned your talk [2].

> > >

> > >

> > > [1]

> > >

> >

> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=125

> > a2210541079e8e7c69e629ad06cabed788f8[2]

> > > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
Jason Gunthorpe Feb. 1, 2018, 10:22 p.m. UTC | #8
On Thu, Feb 01, 2018 at 09:06:55PM +0000, Mario.Limonciello@dell.com wrote:

> I discussed this with some folks and although it would fix the
> problem it is not accurately characterizing the situation.  What is
> likely happening here is that issuing the self test command in
> succession is causing the TPM to restart the self test and not
> complete.  Instead the selfTestDone bit should be polled.

Oh, I wonder if that is beacuse this was copied from the TPM1 version
where we don't have a bit like that??

Should we check this selfTestDone bit before event sending the self
test?

Jsaon
James Bottomley Feb. 2, 2018, 5:46 a.m. UTC | #9
On Thu, 2018-02-01 at 20:12 +0000, Mario.Limonciello@dell.com wrote:
> 
> > 
> > -----Original Message-----
> > From: Paul Menzel [mailto:pmenzel@molgen.mpg.de]
> > Sent: Thursday, February 1, 2018 1:17 PM
> > To: James Bottomley <James.Bottomley@HansenPartnership.com>
> > Cc: linux-integrity <linux-integrity@vger.kernel.org>; Limonciello,
> > Mario
> > <Mario_Limonciello@Dell.com>; regressions@leemhuis.info; Alexander
> > Steffen
> > <Alexander.Steffen@infineon.com>
> > Subject: Re: TPM selftest failure in 4.15 (Dell XPS 13, Nuvoton
> > 6xx)
> > 
> > [resend with regressions@ address fixed, sorry]
> > 
> > Am 01.02.2018 um 20:16 schrieb Paul Menzel:
> > > 
> > > Dear James,
> > > 
> > > 
> > > Am 01.02.2018 um 16:24 schrieb James Bottomley:
> > > > 
> > > > On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:
> > > > > 
> > > > > On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > On 02/01/18 13:16, James Bottomley wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Embarrassingly enough, I'm just on my way to do a TPM
> > > > > > > talk at
> > > > > > > FOSDEM.   I installed my shiny new 4.15 kernel on the
> > > > > > > 'plane and
> > > > > > > this is what I got after I arrived this morning:
> > > > > > > 
> > > > > > > jejb@jarvis:~> dmesg | grep -i tpm
> > > > > > > [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034
> > > > > > > (v03        Tpm2Tabl 00000001 AMI  00000000)
> > > > > > > [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id
> > > > > > > 0xFE, rev-id 2)
> > > > > > > [    1.608863] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    1.640052] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    1.691215] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    1.782377] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    1.953539] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    2.284701] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    2.935743] tpm tpm0: A TPM error (2314) occurred
> > > > > > > continue selftest
> > > > > > > [    4.216236] tpm tpm0: TPM self test failed
> > > > > > > [    4.236829] ima: No TPM chip found, activating TPM-
> > > > > > > bypass! (rc=-19)
> > > > > > > 
> > > > > > > The error is TPM_RC_TESTING, which means it looks like we
> > > > > > > don't wait long enough for the selftests to complete.  I
> > > > > > > get this all the time booting with 4.15.  Fortunately I
> > > > > > > have a 4.13 backup kernel which is fine (otherwise I'd be
> > > > > > > a bit hosed since all my keys now require a TPM).
> > > > > > > 
> > > > > > > I'll debug on the train; my current suspicion is that the
> > > > > > > TPM_LONG duration might be a bit short for this chip (A
> > > > > > > nuvoton 6xx in a dell XPS-13).
> > > > > > 
> > > > > > Please join the thread [1], where I reported the same
> > > > > > problem for the Dell XPS 13 9360. Unfortunately, no
> > > > > > solution was found, especially, as I did not use the TPM.
> > > > > > Other owners of that system unfortunately didn’t have time
> > > > > > to report back if it work for them, so the “conclusion”
> > > > > > kind of was, that my TPM was broken, and had to be tested.
> > > > > 
> > > > > OK, I'll try to find a fix.  It's clearly a marginal problem
> > > > > since I've booted most -rc kernels without issue, so there's
> > > > > some slight timing change in 4.15 that triggered it.  It
> > > > > could also be a shutdown issue.  Any NV ram stuff deferred to
> > > > > start up would take a variable amount of time.
> > > > > 
> > > > > You'd almost think it's some sort of TPM self protest: the
> > > > > more stuff I use it for the more problems it seems to create.
> > > > >  I'm definitely motivated to fix it because without a TPM I
> > > > > can't actually do much with my laptop.
> > > > 
> > > > OK, I investigated but now my TPM has returned to normal (as in
> > > > it passes the selftest immediately).  There's clearly something
> > > > that makes it return TPM_RC_TESTING to every self test probe
> > > > for seconds at a time, but I don't know what it is.  Sending a
> > > > different command seems to cause the problem to clear (Managed
> > > > to reproduce once with the patch to verify), so this is my
> > > > proposed fix.  It's clearly nonsensical to detach the driver
> > > > because the self test still returns TPM_RC_TESTING,
> > > > so convert that return to a TPM_RC_SUCCESS on timeout.  It
> > > > prints a warning message so we'll see it in the logs if it
> > > > causes problems.   Given that this seems to be some type of
> > > > internal TPM issue, I don't believe changing the timings would
> > > > work.
> > > 
> > > Maybe Mario can confirm this issue too, now that Linux 4.15 is
> > > released. Maybe he also has a way to get the Nuvoton people
> > > involved.
> 
> James,
> 
> Did you actually experiment with changing the timings?

No, I already said: waiting 2s for a device driver init is already too
great a burden on the boot sequence.  I don't honestly think waiting
longer would help either ... 2s is a huge amount of time so there's
something else going on with the TPM.

James

> I was told that TPMs that are FIPS validated (such as that in the XPS
> 13) may take longer for the self tests to run.
> 
> > 
> > > 
> > > 
> > > > 
> > > > ---
> > > > 
> > > > diff --git a/drivers/char/tpm/tpm2-cmd.c
> > > > b/drivers/char/tpm/tpm2-cmd.c
> > > > index f40d20671a78..3e1b062d8888 100644
> > > > --- a/drivers/char/tpm/tpm2-cmd.c
> > > > +++ b/drivers/char/tpm/tpm2-cmd.c
> > > > @@ -872,6 +872,17 @@ static int tpm2_do_selftest(struct
> > > > tpm_chip *chip)
> > > >           /* wait longer the next round */
> > > >           delay_msec *= 2;
> > > >       }
> > > > +    if (rc == TPM2_RC_TESTING) {
> > > > +        /*
> > > > +         * A return of RC_TESTING means the TPM is still
> > > > +         * running self tests.  If one fails it will go into
> > > > +         * failure mode and return RC_FAILED to every command,
> > > > +         * so treat a still in testing return as a success
> > > > +         * rather than causing a driver detach.
> > > > +         */
> > > > +        dev_err(&chip->dev,"TPM: Still in testing mode after
> > > > %dms,
> > > > continuing\n", delay_msec);
> > > > +        rc = TPM2_RC_SUCCESS;
> > > > +    }
> > > >       return rc;
> > > >   }
> > > 
> > > Alexander replied the following in the other thread. No idea if
> > > you read
> > > it yet.
> > > 
> > > > 
> > > > The list of "A TPM error (2314) occurred continue selftest" is
> > > > caused
> > > > by my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm:
> > > > React
> > > > correctly to
> > > > RC_TESTING from TPM 2.0 self tests") [1]. 2314 is
> > > > TPM_RC_TESTING, so
> > > > the TPM
> > > > tells us that self-tests are still running in the background.
> > > > This
> > > > problem was
> > > > not visible in previous versions, since it (incorrectly)
> > > > ignored >
> > > > TPM_RC_TESTING.
> > > 
> > > Maybe the commit should be reverted for now until this has
> > > cleared up
> > > for the Dell XPS 13 9360(?) to adhere to Linux’ no regression
> > > policy.
> > > 
> > > 
> > > Kind regards,
> > > 
> > > Paul
> > > 
> > > 
> > > PS: Alexander will also be at FOSDEM and mentioned your talk [2].
> > > 
> > > 
> > > [1]
> > > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > commit?id=125
> > a2210541079e8e7c69e629ad06cabed788f8[2]
> > > 
> > > https://lists.01.org/pipermail/tpm2/2018-January/000486.html
James Bottomley Feb. 2, 2018, 5:46 a.m. UTC | #10
On Thu, 2018-02-01 at 15:22 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2018 at 09:06:55PM +0000, Mario.Limonciello@dell.com
> wrote:
> 
> > 
> > I discussed this with some folks and although it would fix the
> > problem it is not accurately characterizing the situation.  What is
> > likely happening here is that issuing the self test command in
> > succession is causing the TPM to restart the self test and not
> > complete.  Instead the selfTestDone bit should be polled.
> 
> Oh, I wonder if that is beacuse this was copied from the TPM1 version
> where we don't have a bit like that??
> 
> Should we check this selfTestDone bit before event sending the self
> test?

The proposed patch already takes care of that by only sending the self
test once.  I suppose checking the selfTestDone bit and not sending a
self test at all could be an additional optimization on top, though.

James
Jarkko Sakkinen Feb. 8, 2018, 1:05 p.m. UTC | #11
On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote:
> On Thu, 2018-02-01 at 12:42 +0000, James Bottomley wrote:
> > On Thu, 2018-02-01 at 13:21 +0100, Paul Menzel wrote:
> > > 
> > > Dear James,
> > > 
> > > 
> > > On 02/01/18 13:16, James Bottomley wrote:
> > > > 
> > > > 
> > > > Embarrassingly enough, I'm just on my way to do a TPM talk at
> > > > FOSDEM.   I installed my shiny new 4.15 kernel on the 'plane and
> > > > this is what I got after I arrived this morning:
> > > > 
> > > > jejb@jarvis:~> dmesg | grep -i tpm
> > > > [    0.000000] ACPI: TPM2 0x0000000079446CC0 000034
> > > > (v03        Tpm2Tabl 00000001 AMI  00000000)
> > > > [    1.598059] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-
> > > > id 
> > > > 2)
> > > > [    1.608863] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    1.640052] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    1.691215] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    1.782377] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    1.953539] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    2.284701] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    2.935743] tpm tpm0: A TPM error (2314) occurred continue
> > > > selftest
> > > > [    4.216236] tpm tpm0: TPM self test failed
> > > > [    4.236829] ima: No TPM chip found, activating TPM-bypass!
> > > > (rc=-
> > > > 19)
> > > > 
> > > > The error is TPM_RC_TESTING, which means it looks like we don't
> > > > wait long enough for the selftests to complete.  I get this all
> > > > the time booting with 4.15.  Fortunately I have a 4.13 backup
> > > > kernel which is fine (otherwise I'd be a bit hosed since all my
> > > > keys now require a TPM).
> > > > 
> > > > I'll debug on the train; my current suspicion is that the
> > > > TPM_LONG duration might be a bit short for this chip (A nuvoton
> > > > 6xx in a dell XPS-13).
> > > 
> > > Please join the thread [1], where I reported the same problem for
> > > the Dell XPS 13 9360. Unfortunately, no solution was found,
> > > especially, as I did not use the TPM. Other owners of that system
> > > unfortunately didn’t have time to report back if it work for them,
> > > so the “conclusion” kind of was, that my TPM was broken, and had to
> > > be tested.
> > 
> > OK, I'll try to find a fix.  It's clearly a marginal problem since
> > I've booted most -rc kernels without issue, so there's some slight
> > timing change in 4.15 that triggered it.  It could also be a shutdown
> > issue.  Any NV ram stuff deferred to start up would take a variable
> > amount of time.
> > 
> > You'd almost think it's some sort of TPM self protest: the more stuff
> > I use it for the more problems it seems to create.  I'm definitely
> > motivated to fix it because without a TPM I can't actually do much
> > with my laptop.
> 
> OK, I investigated but now my TPM has returned to normal (as in it
> passes the selftest immediately).  There's clearly something that makes
> it return TPM_RC_TESTING to every self test probe for seconds at a
> time, but I don't know what it is.  Sending a different command seems
> to cause the problem to clear (Managed to reproduce once with the patch
> to verify), so this is my proposed fix.  It's clearly nonsensical to
> detach the driver because the self test still returns TPM_RC_TESTING,
> so convert that return to a TPM_RC_SUCCESS on timeout.  It prints a
> warning message so we'll see it in the logs if it causes problems.
>  Given that this seems to be some type of internal TPM issue, I don't
> believe changing the timings would work.

I don't think this is a sane rationale for a fix if the driver has
worked just fine on 4.14. It would be better to first identify the
commit that causes the regression and plan after that how to fix it.

/Jarkko
Jarkko Sakkinen Feb. 8, 2018, 1:18 p.m. UTC | #12
On Thu, Feb 01, 2018 at 08:16:26PM +0100, Paul Menzel wrote:
> > The list of "A TPM error (2314) occurred continue selftest" is caused by
> > my commit 125a2210541079e8e7c69e629ad06cabed788f8c ("tpm: React
> > correctly to
> > RC_TESTING from TPM 2.0 self tests") [1]. 2314 is TPM_RC_TESTING, so the TPM
> > tells us that self-tests are still running in the background. This problem was
> > not visible in previous versions, since it (incorrectly) ignored > TPM_RC_TESTING.
> 
> Maybe the commit should be reverted for now until this has cleared up for
> the Dell XPS 13 9360(?) to adhere to Linux’ no regression policy.

I think this should be done as an immediate action with Cc to stable.

/Jarkko
Ken Goldman Feb. 8, 2018, 4:53 p.m. UTC | #13
On 2/1/2018 3:12 PM, Mario.Limonciello@dell.com wrote:
> 
> I was told that TPMs that are FIPS validated (such as that in the XPS 13) may
> take longer for the self tests to run.

I don't understand why the SeftTest command should take longer.

I understand that a FIPS TPM must do all the self tests before it 
returns any results, while a non-FIPS must only test the algorithms 
required to return the result.  So, some other command will take longer.

~~

 From the TPM spec:

FIPS 140-2 requires that all power-on self-tests be complete before the 
TPM returns any value that depends on the results of a testable 
function. If compliance with FIPS 140-2 is required, any command that 
requires use of an untested function causes the TPM to operate as if 
TPM2_SelfTest(fullTest = NO) was received. The TPM returns 
TPM_RC_TESTING and continues to return TPM_RC_TESTING until all tests 
are complete. Alternatively, it may complete all tests and then complete 
the command. It may also return TPM_RC_NEEDS_TEST.
Ken Goldman Feb. 8, 2018, 5:27 p.m. UTC | #14
On 2/1/2018 12:40 PM, Jason Gunthorpe wrote:

> 
> But if a selftest returns TPM2_RC_TESTING I would expect the next
> command to also fail with a testing in progress code? At least by the
> spec..

It will return that code if the command requires an untested function.
For FIPS, the code is returned if there is any untested function.

Commands that don't require testable functions will succeed.

> The point of invoking selftest is to get to a state where future TPM
> commands will succeed, so returning immediately on RC_TESTING seems
> wrong?

Blocking vs. non-blocking is a TPM vendor implementation option.

Re., "seems wrong", the PC Client TPM changed the requirement, so that
a guaranteed non-blocking option is available.

~~
2.	On receipt of TPM2_SelfTest(fullTest == YES), the TPM SHALL perform a 
full self-test and return the result when all tests are complete.
~~

Of course, at some layer, the device drive is still polling, either for 
a success code (non-blocking) or a response (blocking).

Patch
diff mbox

diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index f40d20671a78..3e1b062d8888 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -872,6 +872,17 @@  static int tpm2_do_selftest(struct tpm_chip *chip)
 		/* wait longer the next round */
 		delay_msec *= 2;
 	}
+	if (rc == TPM2_RC_TESTING) {
+		/*
+		 * A return of RC_TESTING means the TPM is still
+		 * running self tests.  If one fails it will go into
+		 * failure mode and return RC_FAILED to every command,
+		 * so treat a still in testing return as a success
+		 * rather than causing a driver detach.
+		 */
+		dev_err(&chip->dev,"TPM: Still in testing mode after %dms, continuing\n", delay_msec);
+		rc = TPM2_RC_SUCCESS;
+	}
 
 	return rc;
 }