diff mbox series

[v2] tpm: fix ATMEL TPM crash caused by too frequent queries

Message ID 20210707043135.33434-1-hao.wu@rubrik.com (mailing list archive)
State New, archived
Headers show
Series [v2] tpm: fix ATMEL TPM crash caused by too frequent queries | expand

Commit Message

Hao Wu July 7, 2021, 4:31 a.m. UTC
Since kernel 4.14, there was a commit (9f3fc7bcddcb)
fixed the TPM sleep logic from msleep to usleep_range,
so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for around 15ms.

This timeout change caused the ATMEL 1.2 TPM chip crash,
and the patch here is to fix it in the master branch.
Crash signature is as follows:
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135),
I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
```

With at few more changes after 4.14, the timeout was
reduced to less than 1 ms today in the master branch.
- in 4.16 commit cf151a9a44d5 uses `TPM_POLL_SLEEP` instead of
  TPM_TIMEOUT for `wait_for_tpm_stat` and set `TPM_POLL_SLEEP` (1ms).
- in 4.18 commits 59f5a6b07f64 and 424eaf910c32 further
  reduced the timeout in wait_for_tpm_stat to less than 1ms.

This patch is to fix the TPM crash for ATMEL 1.2 TPM chip.
We specifically use 15ms timeout for the ATMEL 1.2 TPM chip
in wait_for_tpm_stat, but keep the timeout for other chips
unchanged. The 15ms timeout was the timeout
works for ATMEL 1.2 TPM chip before 4.14.

Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers")
Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/
Signed-off-by: Hao Wu <hao.wu@rubrik.com>
---
This version (v2) has following changes on top of the last (v1):
- follow the existing way to define two timeouts (min and max)
  for ATMEL chip, thus keep the exact timeout logic for 
  non-ATEML chips.
- limit the timeout increase to only ATMEL TPM 1.2 chips,
  because it is not an issue for TPM 2.0 chips yet.

Test Plan:
- Run fixed kernel with ATMEL TPM chips and see crash
has been fixed.
- Run fixed kernel with non-ATMEL TPM chips, and confirm
the timeout has not been changed.

 drivers/char/tpm/tpm.h          |  6 ++++--
 drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++--
 include/linux/tpm.h             |  3 +++
 3 files changed, 28 insertions(+), 4 deletions(-)

Comments

Jarkko Sakkinen July 7, 2021, 9:24 a.m. UTC | #1
On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
> fixed the TPM sleep logic from msleep to usleep_range,
> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> Before that fix, msleep(5) actually sleeps for around 15ms.

What is TPM sleep logic?

/Jarkko
Hao Wu July 7, 2021, 6:28 p.m. UTC | #2
> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
>> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
>> fixed the TPM sleep logic from msleep to usleep_range,
>> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
>> Before that fix, msleep(5) actually sleeps for around 15ms.
> 
> What is TPM sleep logic?
It is about the commit metnioned in the description
`tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

Any better description or terms ?

> 
> /Jarkko

Hao
Jarkko Sakkinen July 7, 2021, 9:10 p.m. UTC | #3
On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote:
> > On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > 
> > On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
> >> Since kernel 4.14, there was a commit (9f3fc7bcddcb)

BTW, please remove this. You have a fixes tag.

> >> fixed the TPM sleep logic from msleep to usleep_range,
> >> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
> >> Before that fix, msleep(5) actually sleeps for around 15ms.
> > 
> > What is TPM sleep logic?
> It is about the commit metnioned in the description
> `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

What you should do is to explain in simple terms the unwanted behaviour
that you are observing, and also, *when* you observe it. E.g. does this
happen when you use /dev/tpm0, or is it visible already in klog at boot
time. And also: does it occur for anything you put to /dev/tpm0, or is
the bug triggering for some particular TPM commands.

You also need to have information what kind of Atmel chip triggers the
bug. I'd presume that you have access to a machine with such chip?

When you get all that figured out, you should explain how you change
the existing behaviour, and why it makes sense. E.g. if you fixup
timeouts, please just tell how'd you end up choosing the values that
you picked.

E.g. the rationale for that could come from testing and finding the "sweet
spot", or perhaps the reason could be that old values worked, new ones
don't.

Especially in bug fixes the reasoning is *at least* as important as the
the code change itself because I need to know what is going on.

/Jarkko
Hao Wu July 9, 2021, 4:43 a.m. UTC | #4
> On Jul 7, 2021, at 2:10 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote:
>>> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote:
>>> 
>>> On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote:
>>>> Since kernel 4.14, there was a commit (9f3fc7bcddcb)
> 
> BTW, please remove this. You have a fixes tag.
> 
>>>> fixed the TPM sleep logic from msleep to usleep_range,
>>>> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
>>>> Before that fix, msleep(5) actually sleeps for around 15ms.
>>> 
>>> What is TPM sleep logic?
>> It is about the commit metnioned in the description
>> `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers`
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
> 
> What you should do is to explain in simple terms the unwanted behaviour
> that you are observing, and also, *when* you observe it. E.g. does this
> happen when you use /dev/tpm0, or is it visible already in klog at boot
> time. And also: does it occur for anything you put to /dev/tpm0, or is
> the bug triggering for some particular TPM commands.
> 
> You also need to have information what kind of Atmel chip triggers the
> bug. I'd presume that you have access to a machine with such chip?
> 
> When you get all that figured out, you should explain how you change
> the existing behaviour, and why it makes sense. E.g. if you fixup
> timeouts, please just tell how'd you end up choosing the values that
> you picked.
> 
> E.g. the rationale for that could come from testing and finding the "sweet
> spot", or perhaps the reason could be that old values worked, new ones
> don't.
> 
> Especially in bug fixes the reasoning is *at least* as important as the
> the code change itself because I need to know what is going on.
> 
> /Jarkko

Thanks Jarkko for pointing the direction! I have updated the description
and sent a new revision.

Hao
diff mbox series

Patch

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 283f78211c3a..6de1b44c4aab 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,8 +41,10 @@  enum tpm_timeout {
 	TPM_TIMEOUT_RETRY = 100, /* msecs */
 	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
 	TPM_TIMEOUT_POLL = 1,	/* msecs */
-	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
-	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
+	TPM_TIMEOUT_USECS_MIN = 100,	/* usecs */
+	TPM_TIMEOUT_USECS_MAX = 500,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700,	/* usecs */
+	TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000	/* usecs */
 };
 
 /* TPM addresses */
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 55b9d3965ae1..ae27d66fdd94 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,17 @@  static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
 		}
 	} else {
 		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			/* this code path could be executed before
+			 * timeouts initialized in chip instance.
+			 */
+			if (chip->timeout_wait_stat_min &&
+			    chip->timeout_wait_stat_max)
+				usleep_range(chip->timeout_wait_stat_min,
+					     chip->timeout_wait_stat_max);
+			else
+				usleep_range(TPM_TIMEOUT_USECS_MIN,
+					     TPM_TIMEOUT_USECS_MAX);
+
 			status = chip->ops->status(chip);
 			if ((status & mask) == mask)
 				return 0;
@@ -934,6 +943,9 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX);
 	chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX);
 	chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX);
+	/* init timeouts for wait_for_tpm_stat */
+	chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN;
+	chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX;
 	priv->phy_ops = phy_ops;
 	dev_set_drvdata(&chip->dev, priv);
 
@@ -983,6 +995,13 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 
 	priv->manufacturer_id = vendor;
 
+	if (priv->manufacturer_id == TPM_VID_ATML &&
+		!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
+		/* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/
+		chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN;
+		chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX;
+	}
+
 	rc = tpm_tis_read8(priv, TPM_RID(0), &rid);
 	if (rc < 0)
 		goto out_err;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index aa11fe323c56..171b9102c976 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -150,6 +150,8 @@  struct tpm_chip {
 	bool timeout_adjusted;
 	unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */
 	bool duration_adjusted;
+	unsigned int timeout_wait_stat_min; /* usecs */
+	unsigned int timeout_wait_stat_max; /* usecs */
 
 	struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES];
 
@@ -269,6 +271,7 @@  enum tpm2_cc_attrs {
 #define TPM_VID_INTEL    0x8086
 #define TPM_VID_WINBOND  0x1050
 #define TPM_VID_STM      0x104A
+#define TPM_VID_ATML     0x1114
 
 enum tpm_chip_flags {
 	TPM_CHIP_FLAG_TPM2		= BIT(1),