diff mbox

TPM selftest failure in 4.15

Message ID 1517515204.3145.51.camel@HansenPartnership.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Bottomley Feb. 1, 2018, 8 p.m. UTC
On Thu, 2018-02-01 at 11:59 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:
> 
> > 
> > I honestly don't think we should be waiting for the self test at
> > all.
> > We should kick it off and treat any TPM_RC_TESTING error as
> > -EAGAIN.
> > We're already under fire for slow boot sequences and adding 2s just
> > to
> > wait for the TPM to self test adds to that for no real value.
> 
> Arguably the BIOS should have completed the selftest - this stuff
> generally only exists to support embedded.
> 
> I don't like the idea of EAGAIN, that just expose all our users to
> this mess.
> 
> I would support making transmit_cmd genericly wait and retry if the
> TPM insists we need to wait for selftest to complete the specific
> command though.

OK, how about this then?

James

---

Comments

Jason Gunthorpe Feb. 1, 2018, 8:35 p.m. UTC | #1
On Thu, Feb 01, 2018 at 09:00:04PM +0100, James Bottomley wrote:
> On Thu, 2018-02-01 at 11:59 -0700, Jason Gunthorpe wrote:
> > On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:
> > 
> > > 
> > > I honestly don't think we should be waiting for the self test at
> > > all.
> > > We should kick it off and treat any TPM_RC_TESTING error as
> > > -EAGAIN.
> > > We're already under fire for slow boot sequences and adding 2s just
> > > to
> > > wait for the TPM to self test adds to that for no real value.
> > 
> > Arguably the BIOS should have completed the selftest - this stuff
> > generally only exists to support embedded.
> > 
> > I don't like the idea of EAGAIN, that just expose all our users to
> > this mess.
> > 
> > I would support making transmit_cmd genericly wait and retry if the
> > TPM insists we need to wait for selftest to complete the specific
> > command though.
> 
> OK, how about this then?

Yeah, I like this concept much better, thanks

> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> index 1d6729be4cd6..84ed271c060b 100644
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -521,12 +521,32 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space,
>  	const struct tpm_output_header *header = buf;
>  	int err;
>  	ssize_t len;
> +	unsigned int delay_msec = 20;
>  
> -	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> -	if (len <  0)
> -		return len;
> +	/*
> +	 * on first probe we kick off a TPM self test in the
> +	 * background This means the TPM may return RC_TESTING to any
> +	 * command that tries to use a subsystem under test, so do an
> +	 * exponential backoff wait if that happens
> +	 */
> +	for (;;) {
> +		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> +		if (len <  0)
> +			return len;
> +
> +		err = be32_to_cpu(header->return_code);
> +		if (err != TPM2_RC_TESTING ||
> +		    (flags & TPM_TRANSMIT_NOWAIT))
> +			break;

Do TPM and TPM2 use a different return code here?

Jason
James Bottomley Feb. 1, 2018, 9:06 p.m. UTC | #2
On Thu, 2018-02-01 at 13:35 -0700, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2018 at 09:00:04PM +0100, James Bottomley wrote:
> > 
> > On Thu, 2018-02-01 at 11:59 -0700, Jason Gunthorpe wrote:
> > > 
> > > On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:
> > > 
> > > > 
> > > > 
> > > > I honestly don't think we should be waiting for the self test
> > > > at
> > > > all.
> > > > We should kick it off and treat any TPM_RC_TESTING error as
> > > > -EAGAIN.
> > > > We're already under fire for slow boot sequences and adding 2s
> > > > just
> > > > to
> > > > wait for the TPM to self test adds to that for no real value.
> > > 
> > > Arguably the BIOS should have completed the selftest - this stuff
> > > generally only exists to support embedded.
> > > 
> > > I don't like the idea of EAGAIN, that just expose all our users
> > > to
> > > this mess.
> > > 
> > > I would support making transmit_cmd genericly wait and retry if
> > > the
> > > TPM insists we need to wait for selftest to complete the specific
> > > command though.
> > 
> > OK, how about this then?
> 
> Yeah, I like this concept much better, thanks

Great, I'll put it through a few tests then send a formal patch.

> > diff --git a/drivers/char/tpm/tpm-interface.c
> > b/drivers/char/tpm/tpm-interface.c
> > index 1d6729be4cd6..84ed271c060b 100644
> > +++ b/drivers/char/tpm/tpm-interface.c
> > @@ -521,12 +521,32 @@ ssize_t tpm_transmit_cmd(struct tpm_chip
> > *chip, struct tpm_space *space,
> >  	const struct tpm_output_header *header = buf;
> >  	int err;
> >  	ssize_t len;
> > +	unsigned int delay_msec = 20;
> >  
> > -	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> > -	if (len <  0)
> > -		return len;
> > +	/*
> > +	 * on first probe we kick off a TPM self test in the
> > +	 * background This means the TPM may return RC_TESTING to
> > any
> > +	 * command that tries to use a subsystem under test, so do
> > an
> > +	 * exponential backoff wait if that happens
> > +	 */
> > +	for (;;) {
> > +		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz,
> > flags);
> > +		if (len <  0)
> > +			return len;
> > +
> > +		err = be32_to_cpu(header->return_code);
> > +		if (err != TPM2_RC_TESTING ||
> > +		    (flags & TPM_TRANSMIT_NOWAIT))
> > +			break;
> 
> Do TPM and TPM2 use a different return code here?

Yes, TPM2_RC_TESTING is 0x90a and TPM_DOING_SELFTEST is 0x802

James
Jarkko Sakkinen Feb. 8, 2018, 1:10 p.m. UTC | #3
On Thu, Feb 01, 2018 at 09:00:04PM +0100, James Bottomley wrote:
> On Thu, 2018-02-01 at 11:59 -0700, Jason Gunthorpe wrote:
> > On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:
> > 
> > > 
> > > I honestly don't think we should be waiting for the self test at
> > > all.
> > > We should kick it off and treat any TPM_RC_TESTING error as
> > > -EAGAIN.
> > > We're already under fire for slow boot sequences and adding 2s just
> > > to
> > > wait for the TPM to self test adds to that for no real value.
> > 
> > Arguably the BIOS should have completed the selftest - this stuff
> > generally only exists to support embedded.
> > 
> > I don't like the idea of EAGAIN, that just expose all our users to
> > this mess.
> > 
> > I would support making transmit_cmd genericly wait and retry if the
> > TPM insists we need to wait for selftest to complete the specific
> > command though.
> 
> OK, how about this then?
> 
> James

As long as there is no identified a regression it is a waste
of time to review these...

/Jarkko
James Bottomley Feb. 8, 2018, 5:02 p.m. UTC | #4
On Thu, 2018-02-08 at 15:10 +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 01, 2018 at 09:00:04PM +0100, James Bottomley wrote:
> > 
> > On Thu, 2018-02-01 at 11:59 -0700, Jason Gunthorpe wrote:
> > > 
> > > On Thu, Feb 01, 2018 at 07:46:04PM +0100, James Bottomley wrote:
> > > 
> > > > 
> > > > 
> > > > I honestly don't think we should be waiting for the self test
> > > > at
> > > > all.
> > > > We should kick it off and treat any TPM_RC_TESTING error as
> > > > -EAGAIN.
> > > > We're already under fire for slow boot sequences and adding 2s
> > > > just
> > > > to
> > > > wait for the TPM to self test adds to that for no real value.
> > > 
> > > Arguably the BIOS should have completed the selftest - this stuff
> > > generally only exists to support embedded.
> > > 
> > > I don't like the idea of EAGAIN, that just expose all our users
> > > to
> > > this mess.
> > > 
> > > I would support making transmit_cmd genericly wait and retry if
> > > the
> > > TPM insists we need to wait for selftest to complete the specific
> > > command though.
> > 
> > OK, how about this then?
> > 
> > James
> 
> As long as there is no identified a regression it is a waste
> of time to review these...

There is an identified regression: the TPM driver will now periodically
fail to attach.  However, there's no point reviewing until we agree
what the fix is.  I was just waiting to verify this fixed my problem
(which means seeing the messages it spits out proving the TPM has
remained in self test).  I have now seen this and the driver still
works, so I can submit a formal patch.

James
Ken Goldman Feb. 8, 2018, 6:51 p.m. UTC | #5
On 2/1/2018 3:35 PM, Jason Gunthorpe wrote:

> Do TPM and TPM2 use a different return code here?

TPM 1.2 and TPM 2.0 return codes are in separate ranges.

See TPM2 Part 1 Figure 27 — Response Code Evaluation (and I've open 
sourced a command line - value to English convertor).
Jarkko Sakkinen Feb. 9, 2018, 10:02 a.m. UTC | #6
On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> There is an identified regression: the TPM driver will now periodically
> fail to attach.  However, there's no point reviewing until we agree
> what the fix is.  I was just waiting to verify this fixed my problem
> (which means seeing the messages it spits out proving the TPM has
> remained in self test).  I have now seen this and the driver still
> works, so I can submit a formal patch.

For the self-test the duration falls down to 2 seconds as the specs do
not contain any well-defined duration for it, or at least I haven't
found it.

I see three alternative ways the fix the self-test:

1. Execute self-test with fullTest = YES.
2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
   started. Issue a warning on time-out. Check for this flag in
   tpm_transmit_cmd() and tpm_write() and resend self-test command
   if the flag is stil test before the actual command. Return -EBUSY
   and print a warning if self-test is still active.
3. Increase the duration to the "correct" value if we have one.

/Jarkko
Nayna Feb. 9, 2018, 10:30 a.m. UTC | #7
On 02/09/2018 03:32 PM, Jarkko Sakkinen wrote:
> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
>> There is an identified regression: the TPM driver will now periodically
>> fail to attach.  However, there's no point reviewing until we agree
>> what the fix is.  I was just waiting to verify this fixed my problem
>> (which means seeing the messages it spits out proving the TPM has
>> remained in self test).  I have now seen this and the driver still
>> works, so I can submit a formal patch.
> For the self-test the duration falls down to 2 seconds as the specs do
> not contain any well-defined duration for it, or at least I haven't
> found it.
>
> I see three alternative ways the fix the self-test:
>
> 1. Execute self-test with fullTest = YES.
> 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
>     started. Issue a warning on time-out. Check for this flag in
>     tpm_transmit_cmd() and tpm_write() and resend self-test command
>     if the flag is stil test before the actual command. Return -EBUSY
>     and print a warning if self-test is still active.
> 3. Increase the duration to the "correct" value if we have one.
>
> /Jarkko

IIUC, I see the duration and timeout specified in
TCG_PC_Client_Platform_TPM_Profile_PTP_Specification_Family_2.0_Revision:
Section 5.5.1.3 Command Timing: Table 15

For selftest it is given as:

Command                                      |  Duration[ms]    | 
Timeout[ms]

TPM2_SelfTest(fullTest=YES)                 1000 2000
TPM2_SelfTest(fullTest=NO) 20                        750

With that I think, TPM is expected to complete the selftest in max 2000 ms.

Thanks & Regards,
      - Nayna
Alexander Steffen Feb. 9, 2018, 11:47 a.m. UTC | #8
On 09.02.2018 11:02, Jarkko Sakkinen wrote:
> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
>> There is an identified regression: the TPM driver will now periodically
>> fail to attach.  However, there's no point reviewing until we agree
>> what the fix is.  I was just waiting to verify this fixed my problem
>> (which means seeing the messages it spits out proving the TPM has
>> remained in self test).  I have now seen this and the driver still
>> works, so I can submit a formal patch.
> 
> For the self-test the duration falls down to 2 seconds as the specs do
> not contain any well-defined duration for it, or at least I haven't
> found it.
> 
> I see three alternative ways the fix the self-test:
> 
> 1. Execute self-test with fullTest = YES.

I had proposed some fixes in this direction last year:
https://patchwork.kernel.org/patch/10105483/
https://patchwork.kernel.org/patch/10130535/

Those combine the fast test execution with fullTest = NO for 
spec-compliant TPMs with a fallback to fullTest = YES.

> 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
>     started. Issue a warning on time-out. Check for this flag in
>     tpm_transmit_cmd() and tpm_write() and resend self-test command
>     if the flag is stil test before the actual command. Return -EBUSY
>     and print a warning if self-test is still active.
> 3. Increase the duration to the "correct" value if we have one.

What about option #4? Just not calling TPM2_SelfTest at all.

I had already proposed that solution in 
https://www.spinics.net/lists/linux-integrity/msg01007.html, but did not 
get much feedback. According to the specification, that won't lose any 
test coverage ("If a command requires use of an untested algorithm or 
functional module, the TPM performs the test and then completes the 
command actions." 
(https://trustedcomputinggroup.org/wp-content/uploads/TPM-Rev-2.0-Part-1-Architecture-01.38.pdf 
chapter 12.3, page 83/59).) and will definitely be the best option in 
terms of performance and complexity of the driver code.

Alexander
Mimi Zohar Feb. 9, 2018, 12:26 p.m. UTC | #9
On Fri, 2018-02-09 at 12:02 +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > There is an identified regression: the TPM driver will now periodically
> > fail to attach.  However, there's no point reviewing until we agree
> > what the fix is.  I was just waiting to verify this fixed my problem
> > (which means seeing the messages it spits out proving the TPM has
> > remained in self test).  I have now seen this and the driver still
> > works, so I can submit a formal patch.
> 
> For the self-test the duration falls down to 2 seconds as the specs do
> not contain any well-defined duration for it, or at least I haven't
> found it.
> 
> I see three alternative ways the fix the self-test:
> 
> 1. Execute self-test with fullTest = YES.
> 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
>    started. Issue a warning on time-out. Check for this flag in
>    tpm_transmit_cmd() and tpm_write() and resend self-test command
>    if the flag is stil test before the actual command. Return -EBUSY
>    and print a warning if self-test is still active.
> 3. Increase the duration to the "correct" value if we have one.

Please take into account that the TPM must complete initialization
BEFORE IMA looks for the TPM, otherwise IMA goes into TPM-bypass mode.
 This consistently happens with a full self-test (option 1).
Increasing the wait duration will most likely cause this to happen as
well (option 2).

Based on James' patch description, which says "There are various
theories that resending the self-test command actually causes the
tests to restart and thus triggers more TPM_RC_TESTING returns until
the timeout is exceeded", I think IMA's detecting the TPM, by doing a
PCR read, will cause the self-test to restart (option 2).

Reverting the patch that enabled the TPM full self-test, or commenting
out self-test completely, allows IMA to properly find the TPM.

Side note, if the kernel emits TPM initialization warnings, there
should be a successfully completed message as well.

Mimi
James Bottomley Feb. 9, 2018, 4:18 p.m. UTC | #10
On Fri, 2018-02-09 at 12:02 +0200, Jarkko Sakkinen wrote:
> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > 
> > There is an identified regression: the TPM driver will now
> > periodically
> > fail to attach.  However, there's no point reviewing until we agree
> > what the fix is.  I was just waiting to verify this fixed my
> > problem
> > (which means seeing the messages it spits out proving the TPM has
> > remained in self test).  I have now seen this and the driver still
> > works, so I can submit a formal patch.
> 
> For the self-test the duration falls down to 2 seconds as the specs
> do
> not contain any well-defined duration for it, or at least I haven't
> found it.
> 
> I see three alternative ways the fix the self-test:
> 
> 1. Execute self-test with fullTest = YES.
> 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
>    started. Issue a warning on time-out. Check for this flag in
>    tpm_transmit_cmd() and tpm_write() and resend self-test command
>    if the flag is stil test before the actual command. Return -EBUSY
>    and print a warning if self-test is still active.
> 3. Increase the duration to the "correct" value if we have one.

Actually, I disagree.  The first principle we got out of the discussion
was don't re-send the selftest command if the TPM says it's still
running self tests, so we definitely need only to send it once.

I think if the TPM returns returns RC_TESTING we continue on booting
and let it run selftest in the background.  We don't need a flag
because it will return RC_TESTING to any command that tries to exercise
a system under test.  So all we need to do is look for that return,
pause and retry up to a timeout.  If you look at the patch I submitted:

https://marc.info/?l=linux-integrity&m=151812288917753

That's pretty much what it does.

I really think adding more delay to boot up to try and determine when
the selftests "finish" is the wrong way to do this given that data from
the XPS-13 confirms this is somewhere above 2s, which is already a huge
boot wait.

James
James Bottomley Feb. 9, 2018, 4:23 p.m. UTC | #11
On Fri, 2018-02-09 at 07:26 -0500, Mimi Zohar wrote:
> On Fri, 2018-02-09 at 12:02 +0200, Jarkko Sakkinen wrote:
> > 
> > On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > > 
> > > There is an identified regression: the TPM driver will now
> > > periodically
> > > fail to attach.  However, there's no point reviewing until we
> > > agree
> > > what the fix is.  I was just waiting to verify this fixed my
> > > problem
> > > (which means seeing the messages it spits out proving the TPM has
> > > remained in self test).  I have now seen this and the driver
> > > still
> > > works, so I can submit a formal patch.
> > 
> > For the self-test the duration falls down to 2 seconds as the specs
> > do
> > not contain any well-defined duration for it, or at least I haven't
> > found it.
> > 
> > I see three alternative ways the fix the self-test:
> > 
> > 1. Execute self-test with fullTest = YES.
> > 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
> >    started. Issue a warning on time-out. Check for this flag in
> >    tpm_transmit_cmd() and tpm_write() and resend self-test command
> >    if the flag is stil test before the actual command. Return
> > -EBUSY
> >    and print a warning if self-test is still active.
> > 3. Increase the duration to the "correct" value if we have one.
> 
> Please take into account that the TPM must complete initialization
> BEFORE IMA looks for the TPM, otherwise IMA goes into TPM-bypass
> mode.  This consistently happens with a full self-test (option 1).
> Increasing the wait duration will most likely cause this to happen as
> well (option 2).
> 
> Based on James' patch description, which says "There are various
> theories that resending the self-test command actually causes the
> tests to restart and thus triggers more TPM_RC_TESTING returns until
> the timeout is exceeded", I think IMA's detecting the TPM, by doing a
> PCR read, will cause the self-test to restart (option 2).

Actually, no, only doing another selftest seems to restart.  Any other
command will either get a normal return (if the TPM has already tested
the subsystem) or an RC_TESTING return indicate it's still being
tested.  It seems that the PCRs are one of the earliest things to come
on-line, so IMA always works.  This is on my current laptop where I'm
running this patch with IMA: I no longer see the IMA bypass message and
IMA comes up normally.

The delay loop in transmit is instrumented, so I never see the TPM
return RC_TESTING to a PCR read even if it comes out of selftest as
RC_TESTING (I'll keep looking, though, it's only been a few days).

> Reverting the patch that enabled the TPM full self-test, or
> commenting out self-test completely, allows IMA to properly find the
> TPM.
> 
> Side note, if the kernel emits TPM initialization warnings, there
> should be a successfully completed message as well.

It would actually be really nice if we printed TPM identification
information as well (having just had to go over a load of systems and
answer the question what TPM do they have ...)

James
Mimi Zohar Feb. 9, 2018, 9:23 p.m. UTC | #12
On Fri, 2018-02-09 at 08:23 -0800, James Bottomley wrote:
> On Fri, 2018-02-09 at 07:26 -0500, Mimi Zohar wrote:
> > On Fri, 2018-02-09 at 12:02 +0200, Jarkko Sakkinen wrote:
> > > 
> > > On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > > > 
> > > > There is an identified regression: the TPM driver will now
> > > > periodically
> > > > fail to attach.  However, there's no point reviewing until we
> > > > agree
> > > > what the fix is.  I was just waiting to verify this fixed my
> > > > problem
> > > > (which means seeing the messages it spits out proving the TPM has
> > > > remained in self test).  I have now seen this and the driver
> > > > still
> > > > works, so I can submit a formal patch.
> > > 
> > > For the self-test the duration falls down to 2 seconds as the specs
> > > do
> > > not contain any well-defined duration for it, or at least I haven't
> > > found it.
> > > 
> > > I see three alternative ways the fix the self-test:
> > > 
> > > 1. Execute self-test with fullTest = YES.
> > > 2. Have a flag TPM_CHIP_TESTING that is set when the self-test is
> > >    started. Issue a warning on time-out. Check for this flag in
> > >    tpm_transmit_cmd() and tpm_write() and resend self-test command
> > >    if the flag is stil test before the actual command. Return
> > > -EBUSY
> > >    and print a warning if self-test is still active.
> > > 3. Increase the duration to the "correct" value if we have one.
> > 
> > Please take into account that the TPM must complete initialization
> > BEFORE IMA looks for the TPM, otherwise IMA goes into TPM-bypass
> > mode.  This consistently happens with a full self-test (option 1).
> > Increasing the wait duration will most likely cause this to happen as
> > well (option 2).
> > 
> > Based on James' patch description, which says "There are various
> > theories that resending the self-test command actually causes the
> > tests to restart and thus triggers more TPM_RC_TESTING returns until
> > the timeout is exceeded", I think IMA's detecting the TPM, by doing a
> > PCR read, will cause the self-test to restart (option 2).
> 
> Actually, no, only doing another selftest seems to restart.  Any other
> command will either get a normal return (if the TPM has already tested
> the subsystem) or an RC_TESTING return indicate it's still being
> tested.

Jarkko's option 2 suggested resending the self-test, thus my comment.

> It seems that the PCRs are one of the earliest things to come
> on-line, so IMA always works.  This is on my current laptop where I'm
> running this patch with IMA: I no longer see the IMA bypass message and
> IMA comes up normally.

Right, as long as the TPM isn't doing a full self-test.

Mimi
Jarkko Sakkinen Feb. 15, 2018, noon UTC | #13
On Fri, Feb 09, 2018 at 04:00:44PM +0530, Nayna Jain wrote:
> IIUC, I see the duration and timeout specified in
> TCG_PC_Client_Platform_TPM_Profile_PTP_Specification_Family_2.0_Revision:
> Section 5.5.1.3 Command Timing: Table 15
>
> For selftest it is given as:
>
> Command                                      |  Duration[ms]    |
> Timeout[ms]
>
> TPM2_SelfTest(fullTest=YES)                 1000 2000
> TPM2_SelfTest(fullTest=NO) 20                        750
>
> With that I think, TPM is expected to complete the selftest in max 2000 ms.

Thanks. So it looks that some TPMs are not spec compliant.

> Thanks & Regards,
>      - Nayna

/Jarkko
Jarkko Sakkinen Feb. 15, 2018, 12:12 p.m. UTC | #14
On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
> On 09.02.2018 11:02, Jarkko Sakkinen wrote:
> > On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > > There is an identified regression: the TPM driver will now periodically
> > > fail to attach.  However, there's no point reviewing until we agree
> > > what the fix is.  I was just waiting to verify this fixed my problem
> > > (which means seeing the messages it spits out proving the TPM has
> > > remained in self test).  I have now seen this and the driver still
> > > works, so I can submit a formal patch.
> > 
> > For the self-test the duration falls down to 2 seconds as the specs do
> > not contain any well-defined duration for it, or at least I haven't
> > found it.
> > 
> > I see three alternative ways the fix the self-test:
> > 
> > 1. Execute self-test with fullTest = YES.
> 
> I had proposed some fixes in this direction last year:
> https://patchwork.kernel.org/patch/10105483/
> https://patchwork.kernel.org/patch/10130535/
> 
> Those combine the fast test execution with fullTest = NO for spec-compliant
> TPMs with a fallback to fullTest = YES.

The first was accepted.

The 2nd wasn't accpeted mainly for reasons that for me only acceptable
dependency is:

1. Patch that is part of the same patch set.
2. A merged commit.

I didn't event look at the code for the second one at that point because
it was formally done wrong.

What it is doing would be acceptable for me. I still think that TPM
should be fully tested before letting IMA for example to use it.

/Jarkko
Mimi Zohar Feb. 15, 2018, 3:13 p.m. UTC | #15
On Thu, 2018-02-15 at 14:12 +0200, Jarkko Sakkinen wrote:
> On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
> > On 09.02.2018 11:02, Jarkko Sakkinen wrote:
> > > On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
> > > > There is an identified regression: the TPM driver will now periodically
> > > > fail to attach.  However, there's no point reviewing until we agree
> > > > what the fix is.  I was just waiting to verify this fixed my problem
> > > > (which means seeing the messages it spits out proving the TPM has
> > > > remained in self test).  I have now seen this and the driver still
> > > > works, so I can submit a formal patch.
> > > 
> > > For the self-test the duration falls down to 2 seconds as the specs do
> > > not contain any well-defined duration for it, or at least I haven't
> > > found it.
> > > 
> > > I see three alternative ways the fix the self-test:
> > > 
> > > 1. Execute self-test with fullTest = YES.
> > 
> > I had proposed some fixes in this direction last year:
> > https://patchwork.kernel.org/patch/10105483/
> > https://patchwork.kernel.org/patch/10130535/
> > 
> > Those combine the fast test execution with fullTest = NO for spec-compliant
> > TPMs with a fallback to fullTest = YES.
> 
> The first was accepted.
> 
> The 2nd wasn't accpeted mainly for reasons that for me only acceptable
> dependency is:
> 
> 1. Patch that is part of the same patch set.
> 2. A merged commit.
> 
> I didn't event look at the code for the second one at that point because
> it was formally done wrong.
> 
> What it is doing would be acceptable for me. I still think that TPM
> should be fully tested before letting IMA for example to use it.

Why?  The short selftest has worked fine up to now.  The full selftest
delays the TPM way too long and causes IMA to go into TPM-bypass mode.
 The faster the TPM is available, the better for IMA.

It seems all commands, except selftest, the code sleeps in a loop and
checks for the command to finish, but doesn't resend the command.
 Only for selftest is the command resent, instead of just waiting for
it to complete.  Is there any reason for this?

Mimi
Alexander Steffen Feb. 16, 2018, 6:27 p.m. UTC | #16
On 15.02.2018 13:12, Jarkko Sakkinen wrote:
> On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
>> On 09.02.2018 11:02, Jarkko Sakkinen wrote:
>>> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
>>>> There is an identified regression: the TPM driver will now periodically
>>>> fail to attach.  However, there's no point reviewing until we agree
>>>> what the fix is.  I was just waiting to verify this fixed my problem
>>>> (which means seeing the messages it spits out proving the TPM has
>>>> remained in self test).  I have now seen this and the driver still
>>>> works, so I can submit a formal patch.
>>>
>>> For the self-test the duration falls down to 2 seconds as the specs do
>>> not contain any well-defined duration for it, or at least I haven't
>>> found it.
>>>
>>> I see three alternative ways the fix the self-test:
>>>
>>> 1. Execute self-test with fullTest = YES.
>>
>> I had proposed some fixes in this direction last year:
>> https://patchwork.kernel.org/patch/10105483/
>> https://patchwork.kernel.org/patch/10130535/
>>
>> Those combine the fast test execution with fullTest = NO for spec-compliant
>> TPMs with a fallback to fullTest = YES.
> 
> The first was accepted.
> 
> The 2nd wasn't accpeted mainly for reasons that for me only acceptable
> dependency is:
> 
> 1. Patch that is part of the same patch set.
> 2. A merged commit.
> 
> I didn't event look at the code for the second one at that point because
> it was formally done wrong.

Ah, sorry, I thought this was the easiest solution, since it seemed 
likely that the first patch would be merged at some point.

If a similar situation arises, should I then include the first patch in 
a series together with the second, even if that means that there will be 
two identical copies of the first patch (one from when it was first 
published, one as part of the new series)? Or should I just avoid the 
dependency in the second patch, though that will lead to merge conflicts 
when you want accept both patches?

Alexander
Alexander Steffen Feb. 16, 2018, 6:30 p.m. UTC | #17
On 15.02.2018 16:13, Mimi Zohar wrote:
> On Thu, 2018-02-15 at 14:12 +0200, Jarkko Sakkinen wrote:
>> On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
>>> On 09.02.2018 11:02, Jarkko Sakkinen wrote:
>>>> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
>>>>> There is an identified regression: the TPM driver will now periodically
>>>>> fail to attach.  However, there's no point reviewing until we agree
>>>>> what the fix is.  I was just waiting to verify this fixed my problem
>>>>> (which means seeing the messages it spits out proving the TPM has
>>>>> remained in self test).  I have now seen this and the driver still
>>>>> works, so I can submit a formal patch.
>>>>
>>>> For the self-test the duration falls down to 2 seconds as the specs do
>>>> not contain any well-defined duration for it, or at least I haven't
>>>> found it.
>>>>
>>>> I see three alternative ways the fix the self-test:
>>>>
>>>> 1. Execute self-test with fullTest = YES.
>>>
>>> I had proposed some fixes in this direction last year:
>>> https://patchwork.kernel.org/patch/10105483/
>>> https://patchwork.kernel.org/patch/10130535/
>>>
>>> Those combine the fast test execution with fullTest = NO for spec-compliant
>>> TPMs with a fallback to fullTest = YES.
>>
>> The first was accepted.
>>
>> The 2nd wasn't accpeted mainly for reasons that for me only acceptable
>> dependency is:
>>
>> 1. Patch that is part of the same patch set.
>> 2. A merged commit.
>>
>> I didn't event look at the code for the second one at that point because
>> it was formally done wrong.
>>
>> What it is doing would be acceptable for me. I still think that TPM
>> should be fully tested before letting IMA for example to use it.
> 
> Why?  The short selftest has worked fine up to now.  The full selftest
> delays the TPM way too long and causes IMA to go into TPM-bypass mode.
>   The faster the TPM is available, the better for IMA.

I assume that IMA can deal with the TPM not returning RC_SUCCESS for a 
command, right? Then, if speed is the goal, not explicitly triggering 
any selftests and instead letting the TPM handle them internally still 
seems like the best solution. If the user needs a fully tested TPM for 
some reason, he can trigger the selftests himself. But the kernel cannot 
at the same time guarantee that all selftests pass (especially on 
broken/slow devices that take forever to complete them) and make the TPM 
available as early as possible.

I think we had a similar example elsewhere already: The kernel does not 
check all SMART results before accessing the hard disk, why should it do 
that for the TPM?

> It seems all commands, except selftest, the code sleeps in a loop and
> checks for the command to finish, but doesn't resend the command.
>   Only for selftest is the command resent, instead of just waiting for
> it to complete.  Is there any reason for this?

The selftest command is special in that the vendor can choose to have it 
execute its actions synchronously (like all other commands) or 
asynchronously (that is, return immediately and schedule the actions in 
the background). Only in the asynchronous case is the command repeated, 
as a simple way of querying the result. Since we call the command with 
fullTest=NO, only missing selftests are scheduled and with each 
iteration there should be fewer and fewer tests left to be executed, 
similar to how RC_YIELDED works.

Alexander
Nayna Feb. 19, 2018, 9:15 a.m. UTC | #18
On 02/17/2018 12:00 AM, Alexander Steffen wrote:
> On 15.02.2018 16:13, Mimi Zohar wrote:
>> On Thu, 2018-02-15 at 14:12 +0200, Jarkko Sakkinen wrote:
>>> On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
>>>> On 09.02.2018 11:02, Jarkko Sakkinen wrote:
>>>>> On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley wrote:
>>>>>> There is an identified regression: the TPM driver will now 
>>>>>> periodically
>>>>>> fail to attach.  However, there's no point reviewing until we agree
>>>>>> what the fix is.  I was just waiting to verify this fixed my problem
>>>>>> (which means seeing the messages it spits out proving the TPM has
>>>>>> remained in self test).  I have now seen this and the driver still
>>>>>> works, so I can submit a formal patch.
>>>>>
>>>>> For the self-test the duration falls down to 2 seconds as the 
>>>>> specs do
>>>>> not contain any well-defined duration for it, or at least I haven't
>>>>> found it.
>>>>>
>>>>> I see three alternative ways the fix the self-test:
>>>>>
>>>>> 1. Execute self-test with fullTest = YES.
>>>>
>>>> I had proposed some fixes in this direction last year:
>>>> https://patchwork.kernel.org/patch/10105483/
>>>> https://patchwork.kernel.org/patch/10130535/
>>>>
>>>> Those combine the fast test execution with fullTest = NO for 
>>>> spec-compliant
>>>> TPMs with a fallback to fullTest = YES.
>>>
>>> The first was accepted.
>>>
>>> The 2nd wasn't accpeted mainly for reasons that for me only acceptable
>>> dependency is:
>>>
>>> 1. Patch that is part of the same patch set.
>>> 2. A merged commit.
>>>
>>> I didn't event look at the code for the second one at that point 
>>> because
>>> it was formally done wrong.
>>>
>>> What it is doing would be acceptable for me. I still think that TPM
>>> should be fully tested before letting IMA for example to use it.
>>
>> Why?  The short selftest has worked fine up to now.  The full selftest
>> delays the TPM way too long and causes IMA to go into TPM-bypass mode.
>>   The faster the TPM is available, the better for IMA.
>
> I assume that IMA can deal with the TPM not returning RC_SUCCESS for a 
> command, right? Then, if speed is the goal, not explicitly triggering 
> any selftests and instead letting the TPM handle them internally still 
> seems like the best solution. If the user needs a fully tested TPM for 
> some reason, he can trigger the selftests himself. But the kernel 
> cannot at the same time guarantee that all selftests pass (especially 
> on broken/slow devices that take forever to complete them) and make 
> the TPM available as early as possible.
>
> I think we had a similar example elsewhere already: The kernel does 
> not check all SMART results before accessing the hard disk, why should 
> it do that for the TPM?
>
>> It seems all commands, except selftest, the code sleeps in a loop and
>> checks for the command to finish, but doesn't resend the command.
>>   Only for selftest is the command resent, instead of just waiting for
>> it to complete.  Is there any reason for this?
>
> The selftest command is special in that the vendor can choose to have 
> it execute its actions synchronously (like all other commands) or 
> asynchronously (that is, return immediately and schedule the actions 
> in the background). Only in the asynchronous case is the command 
> repeated, as a simple way of querying the result. Since we call the 
> command with fullTest=NO, only missing selftests are scheduled and 
> with each iteration there should be fewer and fewer tests left to be 
> executed, similar to how RC_YIELDED works.
To query for the result, it need not send self test command again, I 
think we can use TPM2_GetTestResult().
According to Spec, TPM Rev 2.0 Part 3 - Commands , Section 10.4, the 
command will return the following possible test results.

TPM_RC_NEEDS_TEST - SelfTest has not been executed.
TPM_RC_TESTING - Tests are still in progress
TPM_RC_FAILURE - Tests Failure.
TPM_RC_SUCCESS - Tests successfully completed.

Thanks & Regards,
     - Nayna

>
> Alexander
>
Jason Gunthorpe Feb. 19, 2018, 10:26 p.m. UTC | #19
On Mon, Feb 19, 2018 at 02:45:31PM +0530, Nayna Jain wrote:

> To query for the result, it need not send self test command again, I think
> we can use TPM2_GetTestResult().
> According to Spec, TPM Rev 2.0 Part 3 - Commands , Section 10.4, the command
> will return the following possible test results.
> 
> TPM_RC_NEEDS_TEST - SelfTest has not been executed.
> TPM_RC_TESTING - Tests are still in progress
> TPM_RC_FAILURE - Tests Failure.
> TPM_RC_SUCCESS - Tests successfully completed.

That sounds like a great way to fix this TPM without creating a whole
new flow..

TPM1 doesn't have this command which is why the self test loop was
written with repeated self test, it just got copied that way when
someone implemented the same thing for TPM2..

Jason>
Jarkko Sakkinen Feb. 20, 2018, 1:05 p.m. UTC | #20
On Fri, 2018-02-16 at 19:27 +0100, Alexander Steffen wrote:
> On 15.02.2018 13:12, Jarkko Sakkinen wrote:
> > On Fri, Feb 09, 2018 at 12:47:10PM +0100, Alexander Steffen wrote:
> > > On 09.02.2018 11:02, Jarkko Sakkinen wrote:
> > > > On Thu, Feb 08, 2018 at 09:02:00AM -0800, James Bottomley
> > > > wrote:
> > > > > There is an identified regression: the TPM driver will now
> > > > > periodically
> > > > > fail to attach.  However, there's no point reviewing until we
> > > > > agree
> > > > > what the fix is.  I was just waiting to verify this fixed my
> > > > > problem
> > > > > (which means seeing the messages it spits out proving the TPM
> > > > > has
> > > > > remained in self test).  I have now seen this and the driver
> > > > > still
> > > > > works, so I can submit a formal patch.
> > > > 
> > > > For the self-test the duration falls down to 2 seconds as the
> > > > specs do
> > > > not contain any well-defined duration for it, or at least I
> > > > haven't
> > > > found it.
> > > > 
> > > > I see three alternative ways the fix the self-test:
> > > > 
> > > > 1. Execute self-test with fullTest = YES.
> > > 
> > > I had proposed some fixes in this direction last year:
> > > https://patchwork.kernel.org/patch/10105483/
> > > https://patchwork.kernel.org/patch/10130535/
> > > 
> > > Those combine the fast test execution with fullTest = NO for
> > > spec-compliant
> > > TPMs with a fallback to fullTest = YES.
> > 
> > The first was accepted.
> > 
> > The 2nd wasn't accpeted mainly for reasons that for me only
> > acceptable
> > dependency is:
> > 
> > 1. Patch that is part of the same patch set.
> > 2. A merged commit.
> > 
> > I didn't event look at the code for the second one at that point
> > because
> > it was formally done wrong.
> 
> Ah, sorry, I thought this was the easiest solution, since it seemed 
> likely that the first patch would be merged at some point.
> 
> If a similar situation arises, should I then include the first patch
> in 
> a series together with the second, even if that means that there will
> be 
> two identical copies of the first patch (one from when it was first 
> published, one as part of the new series)? Or should I just avoid
> the 
> dependency in the second patch, though that will lead to merge
> conflicts 
> when you want accept both patches?
> 
> Alexander

Yes, it would be best to include all patches to the patch set that
have not yet been merged in order to make it self-contained and
easy test and review.

/Jarkko
Ken Goldman April 8, 2018, 6:27 p.m. UTC | #21
On 2/9/2018 11:23 AM, James Bottomley wrote:
> On Fri, 2018-02-09 at 07:26 -0500, Mimi Zohar wrote:
>> On Fri, 2018-02-09 at 12:02 +0200, Jarkko Sakkinen wrote:

> It seems that the PCRs are one of the earliest things to come
> on-line, so IMA always works.  This is on my current laptop where I'm
> running this patch with IMA: I no longer see the IMA bypass message and
> IMA comes up normally.
> 

I would expect PCRs to be available by the time IMA needs them because 
the firmware also extends to PCRs.  Therefore, the hash algorithms will 
be tested very early in boot.

I only worry about the resume from suspend case, where firmware won't be
doing extends.
diff mbox

Patch

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1d6729be4cd6..84ed271c060b 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -521,12 +521,32 @@  ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space,
 	const struct tpm_output_header *header = buf;
 	int err;
 	ssize_t len;
+	unsigned int delay_msec = 20;
 
-	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
-	if (len <  0)
-		return len;
+	/*
+	 * on first probe we kick off a TPM self test in the
+	 * background This means the TPM may return RC_TESTING to any
+	 * command that tries to use a subsystem under test, so do an
+	 * exponential backoff wait if that happens
+	 */
+	for (;;) {
+		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
+		if (len <  0)
+			return len;
+
+		err = be32_to_cpu(header->return_code);
+		if (err != TPM2_RC_TESTING ||
+		    (flags & TPM_TRANSMIT_NOWAIT))
+			break;
+
+		delay_msec *= 2;
+		if (delay_msec > TPM2_DURATION_LONG) {
+			dev_err(&chip->dev,"TPM: still running self tests, giving up waiting\n");
+			break;
+		}
+		tpm_msleep(delay_msec);
+	}
 
-	err = be32_to_cpu(header->return_code);
 	if (err != 0 && desc)
 		dev_err(&chip->dev, "A TPM error (%d) occurred %s\n", err,
 			desc);
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 528cffbd49d3..47c5a5206325 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -495,6 +495,7 @@  extern struct idr dev_nums_idr;
 enum tpm_transmit_flags {
 	TPM_TRANSMIT_UNLOCKED	= BIT(0),
 	TPM_TRANSMIT_RAW	= BIT(1),
+	TPM_TRANSMIT_NOWAIT	= BIT(2),
 };
 
 ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index f40d20671a78..106c126b4fe0 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -849,28 +849,24 @@  static const struct tpm_input_header tpm2_selftest_header = {
 static int tpm2_do_selftest(struct tpm_chip *chip)
 {
 	int rc;
-	unsigned int delay_msec = 20;
-	long duration;
 	struct tpm2_cmd cmd;
 
-	duration = jiffies_to_msecs(
-		tpm2_calc_ordinal_duration(chip, TPM2_CC_SELF_TEST));
-
-	while (duration > 0) {
-		cmd.header.in = tpm2_selftest_header;
-		cmd.params.selftest_in.full_test = 0;
-
-		rc = tpm_transmit_cmd(chip, NULL, &cmd, TPM2_SELF_TEST_IN_SIZE,
-				      0, 0, "continue selftest");
-
-		if (rc != TPM2_RC_TESTING)
-			break;
-
-		tpm_msleep(delay_msec);
-		duration -= delay_msec;
-
-		/* wait longer the next round */
-		delay_msec *= 2;
+	cmd.header.in = tpm2_selftest_header;
+	cmd.params.selftest_in.full_test = 0;
+
+	rc = tpm_transmit_cmd(chip, NULL, &cmd, TPM2_SELF_TEST_IN_SIZE,
+			      0, TPM_TRANSMIT_NOWAIT, "continue selftest");
+
+	if (rc == TPM2_RC_TESTING) {
+		/*
+		 * A return of RC_TESTING means the TPM is still
+		 * running self tests.  If one fails it will go into
+		 * failure mode and return RC_FAILED to every command,
+		 * so treat a still in testing return as a success
+		 * rather than causing a driver detach.
+		 */
+		dev_info(&chip->dev,"TPM: Running self test in background\n");
+		rc = TPM2_RC_SUCCESS;
 	}
 
 	return rc;