diff mbox

[v3] tpm: fix intermittent failure with self tests

Message ID 1519043040.4721.5.camel@HansenPartnership.com (mailing list archive)
State New, archived
Headers show

Commit Message

James Bottomley Feb. 19, 2018, 12:24 p.m. UTC
Ever since

commit 2482b1bba5122b1d5516c909832bdd282015b8e9
Author: Alexander Steffen <Alexander.Steffen@infineon.com>
Date:   Thu Aug 31 19:18:56 2017 +0200

    tpm: Trigger only missing TPM 2.0 self tests

My Nuvoton 6xx in a Dell XPS-13 has been intermittently failing to
work (necessitating a reboot). The problem seems to be that the TPM
gets into a state where the partial self-test doesn't return
TPM_RC_SUCCESS (meaning all tests have run to completion), but instead
returns TPM_RC_TESTING (meaning some tests are still running in the
background).  There are various theories that resending the self-test
command actually causes the tests to restart and thus triggers more
TPM_RC_TESTING returns until the timeout is exceeded.

There are several issues here: firstly being we shouldn't slow down
the boot sequence waiting for the self test to complete once the TPM
backgrounds them.  It will actually make available all functions that
have passed and if it gets a failure return TPM_RC_FAILURE to every
subsequent command.  So the fix is to kick off self tests once and if
they return TPM_RC_TESTING log that as a backgrounded self test and
continue on.  In order to prevent other tpm users from seeing any
TPM_RC_TESTING returns (which it might if they send a command that
needs a TPM subsystem which is still under test), we loop in
tpm_transmit_cmd until either a timeout or we don't get a
TPM_RC_TESTING return.

Finally, there have been observations of strange returns from a
partial test. One Nuvoton is occasionally returning
TPM_RC_COMMAND_CODE, so treat any unexpected return from a partial
self test as an indication we need to run a full self test.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fixes: 2482b1bba5122b1d5516c909832bdd282015b8e9

---
v3: - must pass through TPM2_RC_INTIALIZE
v2: - review feedback: encapsulate transmission and rename NOWAIT constant
    - Rewrite selftest routine to run full test on any unexpected
      (meaning not TPM2_RC_TESTING, TPM2_RC_SUCCESS or
      TPM2_RC_FAILURE) from the partial selftest.
---
 drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++-
 drivers/char/tpm/tpm.h           |  6 +++--
 drivers/char/tpm/tpm2-cmd.c      | 50 ++++++++++++++++++++++++++++++----------
 3 files changed, 78 insertions(+), 15 deletions(-)

Comments

Jarkko Sakkinen Feb. 20, 2018, 5:43 p.m. UTC | #1
I get merge conflicts in my tree but I'll review this.

On Mon, 2018-02-19 at 07:24 -0500, James Bottomley wrote:
> Ever since
> 
> commit 2482b1bba5122b1d5516c909832bdd282015b8e9
> Author: Alexander Steffen <Alexander.Steffen@infineon.com>
> Date:   Thu Aug 31 19:18:56 2017 +0200
> 
>     tpm: Trigger only missing TPM 2.0 self tests
> 
> My Nuvoton 6xx in a Dell XPS-13 has been intermittently failing to
> work (necessitating a reboot). The problem seems to be that the TPM
> gets into a state where the partial self-test doesn't return
> TPM_RC_SUCCESS (meaning all tests have run to completion), but instead
> returns TPM_RC_TESTING (meaning some tests are still running in the
> background).  There are various theories that resending the self-test
> command actually causes the tests to restart and thus triggers more
> TPM_RC_TESTING returns until the timeout is exceeded.
> 
> There are several issues here: firstly being we shouldn't slow down
> the boot sequence waiting for the self test to complete once the TPM
> backgrounds them.  It will actually make available all functions that
> have passed and if it gets a failure return TPM_RC_FAILURE to every
> subsequent command.  So the fix is to kick off self tests once and if
> they return TPM_RC_TESTING log that as a backgrounded self test and
> continue on.  In order to prevent other tpm users from seeing any
> TPM_RC_TESTING returns (which it might if they send a command that
> needs a TPM subsystem which is still under test), we loop in
> tpm_transmit_cmd until either a timeout or we don't get a
> TPM_RC_TESTING return.
> 
> Finally, there have been observations of strange returns from a
> partial test. One Nuvoton is occasionally returning
> TPM_RC_COMMAND_CODE, so treat any unexpected return from a partial
> self test as an indication we need to run a full self test.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Fixes: 2482b1bba5122b1d5516c909832bdd282015b8e9
> 
> ---
> v3: - must pass through TPM2_RC_INTIALIZE
> v2: - review feedback: encapsulate transmission and rename NOWAIT constant
>     - Rewrite selftest routine to run full test on any unexpected
>       (meaning not TPM2_RC_TESTING, TPM2_RC_SUCCESS or
>       TPM2_RC_FAILURE) from the partial selftest.
> ---
>  drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++-
>  drivers/char/tpm/tpm.h           |  6 +++--
>  drivers/char/tpm/tpm2-cmd.c      | 50 ++++++++++++++++++++++++++++++---------
> -
>  3 files changed, 78 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-
> interface.c
> index 9e80a953d693..b9a8914b05d9 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -515,6 +515,41 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct
> tpm_space *space,
>  	return rc ? rc : len;
>  }
>  
> +static ssize_t tpm_transmit_check(struct tpm_chip *chip,
> +				  struct tpm_space *space, const void *buf,
> +				  size_t bufsiz, unsigned int flags)

I would like this function to be renamed as tpm_transmit() and the
current tpm_transmit() as tpm_try_transmit(). That would be a more
understandable naming. I think also that both functions would
deserve documentation because the behavior is starting to become
fairly complex.

> +{
> +	const struct tpm_output_header *header = buf;
> +	unsigned int delay_msec = 20;
> +	ssize_t len;
> +	u32 err;
> +
> +	/*
> +	 * on first probe we kick off a TPM self test in the
> +	 * background This means the TPM may return RC_TESTING to any
> +	 * command that tries to use a subsystem under test, so do an
> +	 * exponential backoff wait if that happens
> +	 */

This could be part of kdoc.

> +	for (;;) {
> +		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> +		if (len <  0)
> +			return len;
> +
> +		err = be32_to_cpu(header->return_code);
> +		if (err != TPM2_RC_TESTING ||
> +		    (flags & TPM_TRANSMIT_NO_WAIT_TESTING))
> +			break;
> +
> +		delay_msec *= 2;
> +		if (delay_msec > TPM2_DURATION_LONG) {
> +			dev_err(&chip->dev,"TPM: still running self tests,
> giving up waiting\n");

's/TPM://'

> +			break;
> +		}
> +		tpm_msleep(delay_msec);
> +	}
> +	return len;
> +}
> +
>  /**
>   * tmp_transmit_cmd - send a tpm command to the device
>   *    The function extracts tpm out header return code
> @@ -540,7 +575,7 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct
> tpm_space *space,
>  	int err;
>  	ssize_t len;
>  
> -	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> +	len = tpm_transmit_check(chip, space, buf, bufsiz, flags);
>  	if (len <  0)
>  		return len;
>  
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index f895fba4e20d..3e083a30a108 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -104,6 +104,7 @@ enum tpm2_return_codes {
>  	TPM2_RC_HASH		= 0x0083, /* RC_FMT1 */
>  	TPM2_RC_HANDLE		= 0x008B,
>  	TPM2_RC_INITIALIZE	= 0x0100, /* RC_VER1 */
> +	TPM2_RC_FAILURE		= 0x0101,
>  	TPM2_RC_DISABLED	= 0x0120,
>  	TPM2_RC_COMMAND_CODE    = 0x0143,
>  	TPM2_RC_TESTING		= 0x090A, /* RC_WARN */
> @@ -499,8 +500,9 @@ extern const struct file_operations tpmrm_fops;
>  extern struct idr dev_nums_idr;
>  
>  enum tpm_transmit_flags {
> -	TPM_TRANSMIT_UNLOCKED	= BIT(0),
> -	TPM_TRANSMIT_RAW	= BIT(1),
> +	TPM_TRANSMIT_UNLOCKED		= BIT(0),
> +	TPM_TRANSMIT_RAW		= BIT(1),
> +	TPM_TRANSMIT_NO_WAIT_TESTING	= BIT(2),
>  };
>  
>  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
> index e5052e2f1924..13d9e74084aa 100644
> --- a/drivers/char/tpm/tpm2-cmd.c
> +++ b/drivers/char/tpm/tpm2-cmd.c
> @@ -838,29 +838,55 @@ EXPORT_SYMBOL_GPL(tpm2_calc_ordinal_duration);
>  static int tpm2_do_selftest(struct tpm_chip *chip)
>  {
>  	int rc;
> -	unsigned int delay_msec = 10;
> -	long duration;
>  	struct tpm_buf buf;
> +	int full_test;
>  
> -	duration = jiffies_to_msecs(
> -		tpm2_calc_ordinal_duration(chip, TPM2_CC_SELF_TEST));
> +	for (full_test = 0; full_test < 2; full_test++) {
> +		const char *test_str = full_test ?
> +			"full selftest" : "incremental selftest";

This code is too "clever" without much of a real purpose. A better
idea would be to have a helper function for sending the self-test
command and just call it twice.

>  
> -	for (;;) {
>  		rc = tpm_buf_init(&buf, TPM2_ST_NO_SESSIONS,
> TPM2_CC_SELF_TEST);
>  		if (rc)
>  			return rc;
>  
> -		/* Perform tests in the background. */
> -		tpm_buf_append_u8(&buf, 0);
> +		tpm_buf_append_u8(&buf, full_test);
> +		dev_info(&chip->dev, "TPM: running %s\n", test_str);
>  		rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0,
> -				      "continue selftest");
> +				      test_str);
>  		tpm_buf_destroy(&buf);
> -		if (rc != TPM2_RC_TESTING || delay_msec >= duration)
> +
> +		if (rc == TPM2_RC_INITIALIZE)
> +			/* TPM needs startup command */
> +			return rc;
> +
> +		if (rc == TPM2_RC_TESTING) {
> +			/*
> +			 * A return of RC_TESTING means the TPM is still
> +			 * running self tests.  If one fails it will go into
> +			 * failure mode and return RC_FAILED to every
> command,
> +			 * so treat a still in testing return as a success
> +			 * rather than causing a driver detach.
> +			 */
> +			dev_info(&chip->dev, "TPM: Running self test in
> background\n");
> +			rc = TPM2_RC_SUCCESS;
> +		}
> +		if (rc == TPM2_RC_SUCCESS)
>  			break;
>  
> -		/* wait longer than before */
> -		delay_msec *= 2;
> -		tpm_msleep(delay_msec);
> +		dev_info(&chip->dev, "TPM: %s failed\n", test_str);
> +	}
> +
> +	if (rc == TPM2_RC_SUCCESS) {
> +		dev_info(&chip->dev, "TPM: selftest succeeded\n");
> +	} else if (rc == TPM2_RC_FAILURE) {
> +		dev_err(&chip->dev, "TPM: selftest failed\n");
> +	} else {
> +		/*
> +		 * Any other error code is unexpected but should not
> +		 * prevent the driver from attaching
> +		 */
> +		dev_err(&chip->dev, "TPM: selftest returns 0x%x, continuing
> anyway\n", rc);
> +		rc = TPM2_RC_SUCCESS;
>  	}

Are these messages mandatory given that tpm_transmit_cmd() already
prints an error message? Could some or all of them removed?

Some other notes:

* tpm_transmit_cmd() uses %d as formatting while this uses %x
* Here it is called "selftest" and in the messages above it is
  called "self test".

/Jarkko
James Bottomley Feb. 20, 2018, 5:51 p.m. UTC | #2
On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> I get merge conflicts in my tree but I'll review this.

You told me to rebase it on that patch that wasn't in your tree:

https://patchwork.kernel.org/patch/10208965/

If you put it in your tree, I can just rebase on top of everything.

James
Jarkko Sakkinen Feb. 20, 2018, 7:26 p.m. UTC | #3
On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > I get merge conflicts in my tree but I'll review this.
> 
> You told me to rebase it on that patch that wasn't in your tree:
> 
> https://patchwork.kernel.org/patch/10208965/
> 
> If you put it in your tree, I can just rebase on top of everything.
> 
> James

Aah. Sorry, my bad I'll move forward on testing :-)

/Jarkko
Jarkko Sakkinen Feb. 20, 2018, 9:27 p.m. UTC | #4
On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > I get merge conflicts in my tree but I'll review this.
> > 
> > You told me to rebase it on that patch that wasn't in your tree:
> > 
> > https://patchwork.kernel.org/patch/10208965/
> > 
> > If you put it in your tree, I can just rebase on top of everything.
> > 
> > James
> 
> Aah. Sorry, my bad I'll move forward on testing :-)
> 
> /Jarkko

Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

/Jarkko
Jarkko Sakkinen Feb. 20, 2018, 9:28 p.m. UTC | #5
On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
> On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> > On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > > I get merge conflicts in my tree but I'll review this.
> > > 
> > > You told me to rebase it on that patch that wasn't in your tree:
> > > 
> > > https://patchwork.kernel.org/patch/10208965/
> > > 
> > > If you put it in your tree, I can just rebase on top of everything.
> > > 
> > > James
> > 
> > Aah. Sorry, my bad I'll move forward on testing :-)
> > 
> > /Jarkko
> 
> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> 
> /Jarkko

These must be taken away:

[    2.843641] tpm tpm0: TPM: running incremental selftest
[    2.854087] tpm tpm0: TPM: selftest succeeded

/Jarkko
Jarkko Sakkinen Feb. 20, 2018, 9:29 p.m. UTC | #6
On Tue, Feb 20, 2018 at 11:28:47PM +0200, Jarkko Sakkinen wrote:
> On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
> > On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> > > On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > > > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > > > I get merge conflicts in my tree but I'll review this.
> > > > 
> > > > You told me to rebase it on that patch that wasn't in your tree:
> > > > 
> > > > https://patchwork.kernel.org/patch/10208965/
> > > > 
> > > > If you put it in your tree, I can just rebase on top of everything.
> > > > 
> > > > James
> > > 
> > > Aah. Sorry, my bad I'll move forward on testing :-)
> > > 
> > > /Jarkko
> > 
> > Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > 
> > /Jarkko
> 
> These must be taken away:
> 
> [    2.843641] tpm tpm0: TPM: running incremental selftest
> [    2.854087] tpm tpm0: TPM: selftest succeeded
> 
> /Jarkko

Right and I tested this with Kabylake XPS13/PTT.

/Jarkko
James Bottomley Feb. 20, 2018, 11:09 p.m. UTC | #7
On Tue, 2018-02-20 at 23:28 +0200, Jarkko Sakkinen wrote:
> On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
> > 
> > On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > > > 
> > > > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > > > 
> > > > > I get merge conflicts in my tree but I'll review this.
> > > > 
> > > > You told me to rebase it on that patch that wasn't in your
> > > > tree:
> > > > 
> > > > https://patchwork.kernel.org/patch/10208965/
> > > > 
> > > > If you put it in your tree, I can just rebase on top of
> > > > everything.
> > > > 
> > > > James
> > > 
> > > Aah. Sorry, my bad I'll move forward on testing :-)
> > > 
> > > /Jarkko
> > 
> > Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > 
> > /Jarkko
> 
> These must be taken away:
> 
> [    2.843641] tpm tpm0: TPM: running incremental selftest
> [    2.854087] tpm tpm0: TPM: selftest succeeded

I'm not wedded to having the messages, but it helps give context when
something like this happens:

[    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
[    1.550108] tpm tpm0: TPM: running incremental selftest
[    1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
[    1.602320] tpm tpm0: TPM: incremental selftest failed
[    1.602322] tpm tpm0: TPM: running full selftest
[    2.523689] tpm tpm0: TPM: selftest succeeded

It also helps explain why I lost a second in the boot sequence to the
TPM having to run a full self test.

Without the chatty messages, what you see is 

[    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
[
    1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest

Which is a bit harder to interpret.

James
Paul Menzel Feb. 21, 2018, 7:04 a.m. UTC | #8
Dear James,


Am 19.02.2018 um 13:24 schrieb James Bottomley:
> Ever since
> 
> commit 2482b1bba5122b1d5516c909832bdd282015b8e9
> Author: Alexander Steffen <Alexander.Steffen@infineon.com>
> Date:   Thu Aug 31 19:18:56 2017 +0200
> 
>      tpm: Trigger only missing TPM 2.0 self tests
> 
> My Nuvoton 6xx in a Dell XPS-13 has been intermittently failing to
> work (necessitating a reboot). The problem seems to be that the TPM
> gets into a state where the partial self-test doesn't return
> TPM_RC_SUCCESS (meaning all tests have run to completion), but instead
> returns TPM_RC_TESTING (meaning some tests are still running in the
> background).  There are various theories that resending the self-test
> command actually causes the tests to restart and thus triggers more
> TPM_RC_TESTING returns until the timeout is exceeded.
> 
> There are several issues here: firstly being we shouldn't slow down
> the boot sequence waiting for the self test to complete once the TPM
> backgrounds them.  It will actually make available all functions that
> have passed and if it gets a failure return TPM_RC_FAILURE to every
> subsequent command.  So the fix is to kick off self tests once and if
> they return TPM_RC_TESTING log that as a backgrounded self test and
> continue on.  In order to prevent other tpm users from seeing any
> TPM_RC_TESTING returns (which it might if they send a command that
> needs a TPM subsystem which is still under test), we loop in
> tpm_transmit_cmd until either a timeout or we don't get a
> TPM_RC_TESTING return.
> 
> Finally, there have been observations of strange returns from a
> partial test. One Nuvoton is occasionally returning
> TPM_RC_COMMAND_CODE, so treat any unexpected return from a partial
> self test as an indication we need to run a full self test.
> 
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> Fixes: 2482b1bba5122b1d5516c909832bdd282015b8e9

Thank you for fixing the regression. Could you tag this for stable too, 
or does this subsystem handle this differently?


Kind regards,

Paul
Jarkko Sakkinen Feb. 21, 2018, 7:25 a.m. UTC | #9
On Tue, Feb 20, 2018 at 06:09:51PM -0500, James Bottomley wrote:
> On Tue, 2018-02-20 at 23:28 +0200, Jarkko Sakkinen wrote:
> > On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
> > > 
> > > On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> > > > 
> > > > On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > > > > 
> > > > > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > > > > 
> > > > > > I get merge conflicts in my tree but I'll review this.
> > > > > 
> > > > > You told me to rebase it on that patch that wasn't in your
> > > > > tree:
> > > > > 
> > > > > https://patchwork.kernel.org/patch/10208965/
> > > > > 
> > > > > If you put it in your tree, I can just rebase on top of
> > > > > everything.
> > > > > 
> > > > > James
> > > > 
> > > > Aah. Sorry, my bad I'll move forward on testing :-)
> > > > 
> > > > /Jarkko
> > > 
> > > Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > > 
> > > /Jarkko
> > 
> > These must be taken away:
> > 
> > [    2.843641] tpm tpm0: TPM: running incremental selftest
> > [    2.854087] tpm tpm0: TPM: selftest succeeded
> 
> I'm not wedded to having the messages, but it helps give context when
> something like this happens:
> 
> [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
> [    1.550108] tpm tpm0: TPM: running incremental selftest
> [    1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
> [    1.602320] tpm tpm0: TPM: incremental selftest failed
> [    1.602322] tpm tpm0: TPM: running full selftest
> [    2.523689] tpm tpm0: TPM: selftest succeeded
> 
> It also helps explain why I lost a second in the boot sequence to the
> TPM having to run a full self test.
> 
> Without the chatty messages, what you see is 
> 
> [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
> [
>     1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
> 
> Which is a bit harder to interpret.
> 
> James

Way too much bloat to klog. Telling whether the full or incremental test
failed makes sense. Otherwise, not so much.

I updated the patch (v4) with things cleaned up that add bloat to the
driver code and squashed tpm_buf patch to it.

/Jarkko
Paul Menzel Feb. 21, 2018, 7:42 a.m. UTC | #10
Dera Jarkko,


Am 21.02.2018 um 08:25 schrieb Jarkko Sakkinen:
> On Tue, Feb 20, 2018 at 06:09:51PM -0500, James Bottomley wrote:
>> On Tue, 2018-02-20 at 23:28 +0200, Jarkko Sakkinen wrote:
>>> On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
>>>>
>>>> On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
>>>>>
>>>>> On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
>>>>>>
>>>>>> On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
>>>>>>>
>>>>>>> I get merge conflicts in my tree but I'll review this.
>>>>>>
>>>>>> You told me to rebase it on that patch that wasn't in your
>>>>>> tree:
>>>>>>
>>>>>> https://patchwork.kernel.org/patch/10208965/
>>>>>>
>>>>>> If you put it in your tree, I can just rebase on top of
>>>>>> everything.
>>>>>>
>>>>>> James
>>>>>
>>>>> Aah. Sorry, my bad I'll move forward on testing :-)
>>>>>
>>>>> /Jarkko
>>>>
>>>> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
>>>>
>>>> /Jarkko
>>>
>>> These must be taken away:
>>>
>>> [    2.843641] tpm tpm0: TPM: running incremental selftest
>>> [    2.854087] tpm tpm0: TPM: selftest succeeded
>>
>> I'm not wedded to having the messages, but it helps give context when
>> something like this happens:
>>
>> [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
>> [    1.550108] tpm tpm0: TPM: running incremental selftest
>> [    1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
>> [    1.602320] tpm tpm0: TPM: incremental selftest failed
>> [    1.602322] tpm tpm0: TPM: running full selftest
>> [    2.523689] tpm tpm0: TPM: selftest succeeded
>>
>> It also helps explain why I lost a second in the boot sequence to the
>> TPM having to run a full self test.
>>
>> Without the chatty messages, what you see is
>>
>> [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
>> [
>>      1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
>>
>> Which is a bit harder to interpret.

> Way too much bloat to klog. Telling whether the full or incremental test
> failed makes sense. Otherwise, not so much.

As a user, I agree with James here. Having more elaborate information 
would have helped me in the past. Two more lines won’t hurt in my 
opinion either.

[…]


Kind regards,

Paul
Jarkko Sakkinen Feb. 21, 2018, 1:36 p.m. UTC | #11
On Wed, Feb 21, 2018 at 08:42:36AM +0100, Paul Menzel wrote:
> Dera Jarkko,
> 
> 
> Am 21.02.2018 um 08:25 schrieb Jarkko Sakkinen:
> > On Tue, Feb 20, 2018 at 06:09:51PM -0500, James Bottomley wrote:
> > > On Tue, 2018-02-20 at 23:28 +0200, Jarkko Sakkinen wrote:
> > > > On Tue, Feb 20, 2018 at 11:27:25PM +0200, Jarkko Sakkinen wrote:
> > > > > 
> > > > > On Tue, Feb 20, 2018 at 09:26:00PM +0200, Jarkko Sakkinen wrote:
> > > > > > 
> > > > > > On Tue, Feb 20, 2018 at 12:51:51PM -0500, James Bottomley wrote:
> > > > > > > 
> > > > > > > On Tue, 2018-02-20 at 19:43 +0200, Jarkko Sakkinen wrote:
> > > > > > > > 
> > > > > > > > I get merge conflicts in my tree but I'll review this.
> > > > > > > 
> > > > > > > You told me to rebase it on that patch that wasn't in your
> > > > > > > tree:
> > > > > > > 
> > > > > > > https://patchwork.kernel.org/patch/10208965/
> > > > > > > 
> > > > > > > If you put it in your tree, I can just rebase on top of
> > > > > > > everything.
> > > > > > > 
> > > > > > > James
> > > > > > 
> > > > > > Aah. Sorry, my bad I'll move forward on testing :-)
> > > > > > 
> > > > > > /Jarkko
> > > > > 
> > > > > Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > > > > 
> > > > > /Jarkko
> > > > 
> > > > These must be taken away:
> > > > 
> > > > [    2.843641] tpm tpm0: TPM: running incremental selftest
> > > > [    2.854087] tpm tpm0: TPM: selftest succeeded
> > > 
> > > I'm not wedded to having the messages, but it helps give context when
> > > something like this happens:
> > > 
> > > [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
> > > [    1.550108] tpm tpm0: TPM: running incremental selftest
> > > [    1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
> > > [    1.602320] tpm tpm0: TPM: incremental selftest failed
> > > [    1.602322] tpm tpm0: TPM: running full selftest
> > > [    2.523689] tpm tpm0: TPM: selftest succeeded
> > > 
> > > It also helps explain why I lost a second in the boot sequence to the
> > > TPM having to run a full self test.
> > > 
> > > Without the chatty messages, what you see is
> > > 
> > > [    1.550099] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)
> > > [
> > >      1.602294] tpm tpm0: A TPM error (323) occurred incremental selftest
> > > 
> > > Which is a bit harder to interpret.
> 
> > Way too much bloat to klog. Telling whether the full or incremental test
> > failed makes sense. Otherwise, not so much.
> 
> As a user, I agree with James here. Having more elaborate information would
> have helped me in the past. Two more lines won’t hurt in my opinion either.

Reporting about success conditions is usually not a great idea. It is
extra noise to the already crowded log. And we do not want three klog
messages telling that the self test failed. It is just ridiculous. And
we do not need a log message telling that self tests have been started.
You can interfere that if you want by using ftrace.

I've never liked tpm_transmit_cmd() desc parameter, though. I think that
should be eventually ripped off. The problem with that parameter is that
it hard to generalize any senseful log message that would be a good fit
for basically anything. I've only used it because it was there when I
took over this subsystem.

What I would suggest here is this:

1. Pass NULL tpm_transmit_cmd().
2. Print one a good fit error message in tpm_do_self_test() instead.

/Jarkko
diff mbox

Patch

diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 9e80a953d693..b9a8914b05d9 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -515,6 +515,41 @@  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
 	return rc ? rc : len;
 }
 
+static ssize_t tpm_transmit_check(struct tpm_chip *chip,
+				  struct tpm_space *space, const void *buf,
+				  size_t bufsiz, unsigned int flags)
+{
+	const struct tpm_output_header *header = buf;
+	unsigned int delay_msec = 20;
+	ssize_t len;
+	u32 err;
+
+	/*
+	 * on first probe we kick off a TPM self test in the
+	 * background This means the TPM may return RC_TESTING to any
+	 * command that tries to use a subsystem under test, so do an
+	 * exponential backoff wait if that happens
+	 */
+	for (;;) {
+		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
+		if (len <  0)
+			return len;
+
+		err = be32_to_cpu(header->return_code);
+		if (err != TPM2_RC_TESTING ||
+		    (flags & TPM_TRANSMIT_NO_WAIT_TESTING))
+			break;
+
+		delay_msec *= 2;
+		if (delay_msec > TPM2_DURATION_LONG) {
+			dev_err(&chip->dev,"TPM: still running self tests, giving up waiting\n");
+			break;
+		}
+		tpm_msleep(delay_msec);
+	}
+	return len;
+}
+
 /**
  * tmp_transmit_cmd - send a tpm command to the device
  *    The function extracts tpm out header return code
@@ -540,7 +575,7 @@  ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct tpm_space *space,
 	int err;
 	ssize_t len;
 
-	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
+	len = tpm_transmit_check(chip, space, buf, bufsiz, flags);
 	if (len <  0)
 		return len;
 
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index f895fba4e20d..3e083a30a108 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -104,6 +104,7 @@  enum tpm2_return_codes {
 	TPM2_RC_HASH		= 0x0083, /* RC_FMT1 */
 	TPM2_RC_HANDLE		= 0x008B,
 	TPM2_RC_INITIALIZE	= 0x0100, /* RC_VER1 */
+	TPM2_RC_FAILURE		= 0x0101,
 	TPM2_RC_DISABLED	= 0x0120,
 	TPM2_RC_COMMAND_CODE    = 0x0143,
 	TPM2_RC_TESTING		= 0x090A, /* RC_WARN */
@@ -499,8 +500,9 @@  extern const struct file_operations tpmrm_fops;
 extern struct idr dev_nums_idr;
 
 enum tpm_transmit_flags {
-	TPM_TRANSMIT_UNLOCKED	= BIT(0),
-	TPM_TRANSMIT_RAW	= BIT(1),
+	TPM_TRANSMIT_UNLOCKED		= BIT(0),
+	TPM_TRANSMIT_RAW		= BIT(1),
+	TPM_TRANSMIT_NO_WAIT_TESTING	= BIT(2),
 };
 
 ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index e5052e2f1924..13d9e74084aa 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -838,29 +838,55 @@  EXPORT_SYMBOL_GPL(tpm2_calc_ordinal_duration);
 static int tpm2_do_selftest(struct tpm_chip *chip)
 {
 	int rc;
-	unsigned int delay_msec = 10;
-	long duration;
 	struct tpm_buf buf;
+	int full_test;
 
-	duration = jiffies_to_msecs(
-		tpm2_calc_ordinal_duration(chip, TPM2_CC_SELF_TEST));
+	for (full_test = 0; full_test < 2; full_test++) {
+		const char *test_str = full_test ?
+			"full selftest" : "incremental selftest";
 
-	for (;;) {
 		rc = tpm_buf_init(&buf, TPM2_ST_NO_SESSIONS, TPM2_CC_SELF_TEST);
 		if (rc)
 			return rc;
 
-		/* Perform tests in the background. */
-		tpm_buf_append_u8(&buf, 0);
+		tpm_buf_append_u8(&buf, full_test);
+		dev_info(&chip->dev, "TPM: running %s\n", test_str);
 		rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0,
-				      "continue selftest");
+				      test_str);
 		tpm_buf_destroy(&buf);
-		if (rc != TPM2_RC_TESTING || delay_msec >= duration)
+
+		if (rc == TPM2_RC_INITIALIZE)
+			/* TPM needs startup command */
+			return rc;
+
+		if (rc == TPM2_RC_TESTING) {
+			/*
+			 * A return of RC_TESTING means the TPM is still
+			 * running self tests.  If one fails it will go into
+			 * failure mode and return RC_FAILED to every command,
+			 * so treat a still in testing return as a success
+			 * rather than causing a driver detach.
+			 */
+			dev_info(&chip->dev, "TPM: Running self test in background\n");
+			rc = TPM2_RC_SUCCESS;
+		}
+		if (rc == TPM2_RC_SUCCESS)
 			break;
 
-		/* wait longer than before */
-		delay_msec *= 2;
-		tpm_msleep(delay_msec);
+		dev_info(&chip->dev, "TPM: %s failed\n", test_str);
+	}
+
+	if (rc == TPM2_RC_SUCCESS) {
+		dev_info(&chip->dev, "TPM: selftest succeeded\n");
+	} else if (rc == TPM2_RC_FAILURE) {
+		dev_err(&chip->dev, "TPM: selftest failed\n");
+	} else {
+		/*
+		 * Any other error code is unexpected but should not
+		 * prevent the driver from attaching
+		 */
+		dev_err(&chip->dev, "TPM: selftest returns 0x%x, continuing anyway\n", rc);
+		rc = TPM2_RC_SUCCESS;
 	}
 
 	return rc;