Message ID | 20210707043135.33434-1-hao.wu@rubrik.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] tpm: fix ATMEL TPM crash caused by too frequent queries | expand |
On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote: > Since kernel 4.14, there was a commit (9f3fc7bcddcb) > fixed the TPM sleep logic from msleep to usleep_range, > so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward. > Before that fix, msleep(5) actually sleeps for around 15ms. What is TPM sleep logic? /Jarkko
> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote: > > On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote: >> Since kernel 4.14, there was a commit (9f3fc7bcddcb) >> fixed the TPM sleep logic from msleep to usleep_range, >> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward. >> Before that fix, msleep(5) actually sleeps for around 15ms. > > What is TPM sleep logic? It is about the commit metnioned in the description `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers` https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 Any better description or terms ? > > /Jarkko Hao
On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote: > > On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote: > > > > On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote: > >> Since kernel 4.14, there was a commit (9f3fc7bcddcb) BTW, please remove this. You have a fixes tag. > >> fixed the TPM sleep logic from msleep to usleep_range, > >> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward. > >> Before that fix, msleep(5) actually sleeps for around 15ms. > > > > What is TPM sleep logic? > It is about the commit metnioned in the description > `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers` > https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 What you should do is to explain in simple terms the unwanted behaviour that you are observing, and also, *when* you observe it. E.g. does this happen when you use /dev/tpm0, or is it visible already in klog at boot time. And also: does it occur for anything you put to /dev/tpm0, or is the bug triggering for some particular TPM commands. You also need to have information what kind of Atmel chip triggers the bug. I'd presume that you have access to a machine with such chip? When you get all that figured out, you should explain how you change the existing behaviour, and why it makes sense. E.g. if you fixup timeouts, please just tell how'd you end up choosing the values that you picked. E.g. the rationale for that could come from testing and finding the "sweet spot", or perhaps the reason could be that old values worked, new ones don't. Especially in bug fixes the reasoning is *at least* as important as the the code change itself because I need to know what is going on. /Jarkko
> On Jul 7, 2021, at 2:10 PM, Jarkko Sakkinen <jarkko@kernel.org> wrote: > > On Wed, Jul 07, 2021 at 11:28:35AM -0700, Hao Wu wrote: >>> On Jul 7, 2021, at 2:24 AM, Jarkko Sakkinen <jarkko@kernel.org> wrote: >>> >>> On Tue, Jul 06, 2021 at 09:31:35PM -0700, Hao Wu wrote: >>>> Since kernel 4.14, there was a commit (9f3fc7bcddcb) > > BTW, please remove this. You have a fixes tag. > >>>> fixed the TPM sleep logic from msleep to usleep_range, >>>> so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward. >>>> Before that fix, msleep(5) actually sleeps for around 15ms. >>> >>> What is TPM sleep logic? >> It is about the commit metnioned in the description >> `tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers` >> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 > > What you should do is to explain in simple terms the unwanted behaviour > that you are observing, and also, *when* you observe it. E.g. does this > happen when you use /dev/tpm0, or is it visible already in klog at boot > time. And also: does it occur for anything you put to /dev/tpm0, or is > the bug triggering for some particular TPM commands. > > You also need to have information what kind of Atmel chip triggers the > bug. I'd presume that you have access to a machine with such chip? > > When you get all that figured out, you should explain how you change > the existing behaviour, and why it makes sense. E.g. if you fixup > timeouts, please just tell how'd you end up choosing the values that > you picked. > > E.g. the rationale for that could come from testing and finding the "sweet > spot", or perhaps the reason could be that old values worked, new ones > don't. > > Especially in bug fixes the reasoning is *at least* as important as the > the code change itself because I need to know what is going on. > > /Jarkko Thanks Jarkko for pointing the direction! I have updated the description and sent a new revision. Hao
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h index 283f78211c3a..6de1b44c4aab 100644 --- a/drivers/char/tpm/tpm.h +++ b/drivers/char/tpm/tpm.h @@ -41,8 +41,10 @@ enum tpm_timeout { TPM_TIMEOUT_RETRY = 100, /* msecs */ TPM_TIMEOUT_RANGE_US = 300, /* usecs */ TPM_TIMEOUT_POLL = 1, /* msecs */ - TPM_TIMEOUT_USECS_MIN = 100, /* usecs */ - TPM_TIMEOUT_USECS_MAX = 500 /* usecs */ + TPM_TIMEOUT_USECS_MIN = 100, /* usecs */ + TPM_TIMEOUT_USECS_MAX = 500, /* usecs */ + TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700, /* usecs */ + TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000 /* usecs */ }; /* TPM addresses */ diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 55b9d3965ae1..ae27d66fdd94 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask, } } else { do { - usleep_range(TPM_TIMEOUT_USECS_MIN, - TPM_TIMEOUT_USECS_MAX); + /* this code path could be executed before + * timeouts initialized in chip instance. + */ + if (chip->timeout_wait_stat_min && + chip->timeout_wait_stat_max) + usleep_range(chip->timeout_wait_stat_min, + chip->timeout_wait_stat_max); + else + usleep_range(TPM_TIMEOUT_USECS_MIN, + TPM_TIMEOUT_USECS_MAX); + status = chip->ops->status(chip); if ((status & mask) == mask) return 0; @@ -934,6 +943,9 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, chip->timeout_b = msecs_to_jiffies(TIS_TIMEOUT_B_MAX); chip->timeout_c = msecs_to_jiffies(TIS_TIMEOUT_C_MAX); chip->timeout_d = msecs_to_jiffies(TIS_TIMEOUT_D_MAX); + /* init timeouts for wait_for_tpm_stat */ + chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN; + chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX; priv->phy_ops = phy_ops; dev_set_drvdata(&chip->dev, priv); @@ -983,6 +995,13 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq, priv->manufacturer_id = vendor; + if (priv->manufacturer_id == TPM_VID_ATML && + !(chip->flags & TPM_CHIP_FLAG_TPM2)) { + /* If TPM chip is 1.2 ATMEL chip, timeout need to be relaxed*/ + chip->timeout_wait_stat_min = TPM_ATML_TIMEOUT_WAIT_STAT_MIN; + chip->timeout_wait_stat_max = TPM_ATML_TIMEOUT_WAIT_STAT_MAX; + } + rc = tpm_tis_read8(priv, TPM_RID(0), &rid); if (rc < 0) goto out_err; diff --git a/include/linux/tpm.h b/include/linux/tpm.h index aa11fe323c56..171b9102c976 100644 --- a/include/linux/tpm.h +++ b/include/linux/tpm.h @@ -150,6 +150,8 @@ struct tpm_chip { bool timeout_adjusted; unsigned long duration[TPM_NUM_DURATIONS]; /* jiffies */ bool duration_adjusted; + unsigned int timeout_wait_stat_min; /* usecs */ + unsigned int timeout_wait_stat_max; /* usecs */ struct dentry *bios_dir[TPM_NUM_EVENT_LOG_FILES]; @@ -269,6 +271,7 @@ enum tpm2_cc_attrs { #define TPM_VID_INTEL 0x8086 #define TPM_VID_WINBOND 0x1050 #define TPM_VID_STM 0x104A +#define TPM_VID_ATML 0x1114 enum tpm_chip_flags { TPM_CHIP_FLAG_TPM2 = BIT(1),
Since kernel 4.14, there was a commit (9f3fc7bcddcb) fixed the TPM sleep logic from msleep to usleep_range, so that the TPM sleeps exactly with TPM_TIMEOUT (=5ms) afterward. Before that fix, msleep(5) actually sleeps for around 15ms. This timeout change caused the ATMEL 1.2 TPM chip crash, and the patch here is to fix it in the master branch. Crash signature is as follows: ``` $ tpm_sealdata -z Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error $ sudo dmesg | grep tpm0 [59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62 ``` With at few more changes after 4.14, the timeout was reduced to less than 1 ms today in the master branch. - in 4.16 commit cf151a9a44d5 uses `TPM_POLL_SLEEP` instead of TPM_TIMEOUT for `wait_for_tpm_stat` and set `TPM_POLL_SLEEP` (1ms). - in 4.18 commits 59f5a6b07f64 and 424eaf910c32 further reduced the timeout in wait_for_tpm_stat to less than 1ms. This patch is to fix the TPM crash for ATMEL 1.2 TPM chip. We specifically use 15ms timeout for the ATMEL 1.2 TPM chip in wait_for_tpm_stat, but keep the timeout for other chips unchanged. The 15ms timeout was the timeout works for ATMEL 1.2 TPM chip before 4.14. Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers") Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@rubrik.com/ Signed-off-by: Hao Wu <hao.wu@rubrik.com> --- This version (v2) has following changes on top of the last (v1): - follow the existing way to define two timeouts (min and max) for ATMEL chip, thus keep the exact timeout logic for non-ATEML chips. - limit the timeout increase to only ATMEL TPM 1.2 chips, because it is not an issue for TPM 2.0 chips yet. Test Plan: - Run fixed kernel with ATMEL TPM chips and see crash has been fixed. - Run fixed kernel with non-ATMEL TPM chips, and confirm the timeout has not been changed. drivers/char/tpm/tpm.h | 6 ++++-- drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++-- include/linux/tpm.h | 3 +++ 3 files changed, 28 insertions(+), 4 deletions(-)