Message ID | 20201015214430.17937-1-jsnitsel@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tpm_tis: Disable interrupts on ThinkPad T490s | expand |
James should this get tacked on the end of your patchset? Regards, Jerry
On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > > There is a misconfiguration in the bios of the gpio pin used for the > interrupt in the T490s. When interrupts are enabled in the tpm_tis > driver code this results in an interrupt storm. This was initially > reported when we attempted to enable the interrupt code in the tpm_tis > driver, which previously wasn't setting a flag to enable it. Due to > the reports of the interrupt storm that code was reverted and we went back > to polling instead of using interrupts. Now that we know the T490s problem > is a firmware issue, add code to check if the system is a T490s and > disable interrupts if that is the case. This will allow us to enable > interrupts for everyone else. If the user has a fixed bios they can > force the enabling of interrupts with tpm_tis.interrupts=1 on the > kernel command line. I think an implication of this is that systems haven't been well-tested with interrupts enabled. In general when we've found a firmware issue in one place it ends up happening elsewhere as well, so it wouldn't surprise me if there are other machines that will also be unhappy with interrupts enabled. Would it be possible to automatically detect this case (eg, if we get more than a certain number of interrupts in a certain timeframe immediately after enabling the interrupt) and automatically fall back to polling in that case? It would also mean that users with fixed firmware wouldn't need to pass a parameter.
Hi, On 10/15/20 11:44 PM, Jerry Snitselaar wrote: > There is a misconfiguration in the bios of the gpio pin used for the > interrupt in the T490s. When interrupts are enabled in the tpm_tis > driver code this results in an interrupt storm. This was initially > reported when we attempted to enable the interrupt code in the tpm_tis > driver, which previously wasn't setting a flag to enable it. Due to > the reports of the interrupt storm that code was reverted and we went back > to polling instead of using interrupts. Now that we know the T490s problem > is a firmware issue, add code to check if the system is a T490s and > disable interrupts if that is the case. This will allow us to enable > interrupts for everyone else. If the user has a fixed bios they can > force the enabling of interrupts with tpm_tis.interrupts=1 on the > kernel command line. > > Cc: jarkko@kernel.org > Cc: Peter Huewe <peterhuewe@gmx.de> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Hans de Goede <hdegoede@redhat.com> > Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Patch looks good to me: Reviewed-by: Hans de Goede <hdegoede@redhat.com> Regards, Hans > --- > drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++-- > 1 file changed, 27 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c > index 0b214963539d..4ed6e660273a 100644 > --- a/drivers/char/tpm/tpm_tis.c > +++ b/drivers/char/tpm/tpm_tis.c > @@ -27,6 +27,7 @@ > #include <linux/of.h> > #include <linux/of_device.h> > #include <linux/kernel.h> > +#include <linux/dmi.h> > #include "tpm.h" > #include "tpm_tis_core.h" > > @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da > return container_of(data, struct tpm_tis_tcg_phy, priv); > } > > -static bool interrupts = true; > -module_param(interrupts, bool, 0444); > +static int interrupts = -1; > +module_param(interrupts, int, 0444); > MODULE_PARM_DESC(interrupts, "Enable interrupts"); > > static bool itpm; > @@ -63,6 +64,28 @@ module_param(force, bool, 0444); > MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry"); > #endif > > +static int tpm_tis_disable_irq(const struct dmi_system_id *d) > +{ > + if (interrupts == -1) { > + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident); > + interrupts = 0; > + } > + > + return 0; > +} > + > +static const struct dmi_system_id tpm_tis_dmi_table[] = { > + { > + .callback = tpm_tis_disable_irq, > + .ident = "ThinkPad T490s", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), > + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"), > + }, > + }, > + {} > +}; > + > #if defined(CONFIG_PNP) && defined(CONFIG_ACPI) > static int has_hid(struct acpi_device *dev, const char *hid) > { > @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info) > int irq = -1; > int rc; > > + dmi_check_system(tpm_tis_dmi_table); > + > rc = check_acpi_tpm2(dev); > if (rc) > return rc; >
Hi, On 10/16/20 12:39 AM, Matthew Garrett wrote: > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >> >> There is a misconfiguration in the bios of the gpio pin used for the >> interrupt in the T490s. When interrupts are enabled in the tpm_tis >> driver code this results in an interrupt storm. This was initially >> reported when we attempted to enable the interrupt code in the tpm_tis >> driver, which previously wasn't setting a flag to enable it. Due to >> the reports of the interrupt storm that code was reverted and we went back >> to polling instead of using interrupts. Now that we know the T490s problem >> is a firmware issue, add code to check if the system is a T490s and >> disable interrupts if that is the case. This will allow us to enable >> interrupts for everyone else. If the user has a fixed bios they can >> force the enabling of interrupts with tpm_tis.interrupts=1 on the >> kernel command line. > > I think an implication of this is that systems haven't been > well-tested with interrupts enabled. In general when we've found a > firmware issue in one place it ends up happening elsewhere as well, so > it wouldn't surprise me if there are other machines that will also be > unhappy with interrupts enabled. Would it be possible to automatically > detect this case (eg, if we get more than a certain number of > interrupts in a certain timeframe immediately after enabling the > interrupt) and automatically fall back to polling in that case? It > would also mean that users with fixed firmware wouldn't need to pass a > parameter. IIRC then at least on the T490 the irq storm caused systems to not boot in some cases. I guess if we detect the storm and disable the irq we might fix that... OTOH this problem seems to only hit a certain generation of Thinkpads so with some luck the denylist should not be too big and the denylist approach should work. Regards, Hans
On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote: > There is a misconfiguration in the bios of the gpio pin used for the > interrupt in the T490s. When interrupts are enabled in the tpm_tis > driver code this results in an interrupt storm. This was initially > reported when we attempted to enable the interrupt code in the tpm_tis > driver, which previously wasn't setting a flag to enable it. Due to > the reports of the interrupt storm that code was reverted and we went back > to polling instead of using interrupts. Now that we know the T490s problem > is a firmware issue, add code to check if the system is a T490s and > disable interrupts if that is the case. This will allow us to enable > interrupts for everyone else. If the user has a fixed bios they can > force the enabling of interrupts with tpm_tis.interrupts=1 on the > kernel command line. > > Cc: jarkko@kernel.org > Cc: Peter Huewe <peterhuewe@gmx.de> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Hans de Goede <hdegoede@redhat.com> > Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> I'll apply this and make it available in linux-next. /Jarkko > --- > drivers/char/tpm/tpm_tis.c | 29 +++++++++++++++++++++++++++-- > 1 file changed, 27 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c > index 0b214963539d..4ed6e660273a 100644 > --- a/drivers/char/tpm/tpm_tis.c > +++ b/drivers/char/tpm/tpm_tis.c > @@ -27,6 +27,7 @@ > #include <linux/of.h> > #include <linux/of_device.h> > #include <linux/kernel.h> > +#include <linux/dmi.h> > #include "tpm.h" > #include "tpm_tis_core.h" > > @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da > return container_of(data, struct tpm_tis_tcg_phy, priv); > } > > -static bool interrupts = true; > -module_param(interrupts, bool, 0444); > +static int interrupts = -1; > +module_param(interrupts, int, 0444); > MODULE_PARM_DESC(interrupts, "Enable interrupts"); > > static bool itpm; > @@ -63,6 +64,28 @@ module_param(force, bool, 0444); > MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry"); > #endif > > +static int tpm_tis_disable_irq(const struct dmi_system_id *d) > +{ > + if (interrupts == -1) { > + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident); > + interrupts = 0; > + } > + > + return 0; > +} > + > +static const struct dmi_system_id tpm_tis_dmi_table[] = { > + { > + .callback = tpm_tis_disable_irq, > + .ident = "ThinkPad T490s", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), > + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"), > + }, > + }, > + {} > +}; > + > #if defined(CONFIG_PNP) && defined(CONFIG_ACPI) > static int has_hid(struct acpi_device *dev, const char *hid) > { > @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info) > int irq = -1; > int rc; > > + dmi_check_system(tpm_tis_dmi_table); > + > rc = check_acpi_tpm2(dev); > if (rc) > return rc; > -- > 2.28.0 > >
On Mon, Oct 19, 2020 at 12:11:44AM +0300, Jarkko Sakkinen wrote: > On Thu, Oct 15, 2020 at 02:44:30PM -0700, Jerry Snitselaar wrote: > > There is a misconfiguration in the bios of the gpio pin used for the > > interrupt in the T490s. When interrupts are enabled in the tpm_tis > > driver code this results in an interrupt storm. This was initially > > reported when we attempted to enable the interrupt code in the tpm_tis > > driver, which previously wasn't setting a flag to enable it. Due to > > the reports of the interrupt storm that code was reverted and we went back > > to polling instead of using interrupts. Now that we know the T490s problem > > is a firmware issue, add code to check if the system is a T490s and > > disable interrupts if that is the case. This will allow us to enable > > interrupts for everyone else. If the user has a fixed bios they can > > force the enabling of interrupts with tpm_tis.interrupts=1 on the > > kernel command line. > > > > Cc: jarkko@kernel.org > > Cc: Peter Huewe <peterhuewe@gmx.de> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Hans de Goede <hdegoede@redhat.com> > > Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com> > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> > > I'll apply this and make it available in linux-next. Applied. Thank you. /Jarkko
Matthew Garrett @ 2020-10-15 15:39 MST: > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >> >> There is a misconfiguration in the bios of the gpio pin used for the >> interrupt in the T490s. When interrupts are enabled in the tpm_tis >> driver code this results in an interrupt storm. This was initially >> reported when we attempted to enable the interrupt code in the tpm_tis >> driver, which previously wasn't setting a flag to enable it. Due to >> the reports of the interrupt storm that code was reverted and we went back >> to polling instead of using interrupts. Now that we know the T490s problem >> is a firmware issue, add code to check if the system is a T490s and >> disable interrupts if that is the case. This will allow us to enable >> interrupts for everyone else. If the user has a fixed bios they can >> force the enabling of interrupts with tpm_tis.interrupts=1 on the >> kernel command line. > > I think an implication of this is that systems haven't been > well-tested with interrupts enabled. In general when we've found a > firmware issue in one place it ends up happening elsewhere as well, so > it wouldn't surprise me if there are other machines that will also be > unhappy with interrupts enabled. Would it be possible to automatically > detect this case (eg, if we get more than a certain number of > interrupts in a certain timeframe immediately after enabling the > interrupt) and automatically fall back to polling in that case? It > would also mean that users with fixed firmware wouldn't need to pass a > parameter. I believe Matthew is correct here. I found another system today with completely different vendor for both the system and the tpm chip. In addition another Lenovo model, the L490, has the issue. This initial attempt at a solution like Matthew suggested works on the system I found today, but I imagine it is all sorts of wrong. In the 2 systems where I've seen it, there are about 100000 interrupts in around 1.5 seconds, and then the irq code shuts down the interrupt because they aren't being handled. diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index 49ae09ac604f..478e9d02a3fa 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -27,6 +27,11 @@ #include "tpm.h" #include "tpm_tis_core.h" +static unsigned int time_start = 0; +static bool storm_check = true; +static bool storm_killed = false; +static u32 irqs_fired = 0; + static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) return rc; } -static void disable_interrupts(struct tpm_chip *chip) +static void __disable_interrupts(struct tpm_chip *chip) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); u32 intmask; int rc; - if (priv->irq == 0) - return; - rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); if (rc < 0) intmask = 0; intmask &= ~TPM_GLOBAL_INT_ENABLE; rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); + chip->flags &= ~TPM_CHIP_FLAG_IRQ; +} + +static void disable_interrupts(struct tpm_chip *chip) +{ + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); + if (priv->irq == 0) + return; + + __disable_interrupts(chip); devm_free_irq(chip->dev.parent, priv->irq, chip); priv->irq = 0; - chip->flags &= ~TPM_CHIP_FLAG_IRQ; } /* @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) int rc, irq; struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); + if (unlikely(storm_killed)) { + devm_free_irq(chip->dev.parent, priv->irq, chip); + priv->irq = 0; + storm_killed = false; + } + if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) return tpm_tis_send_main(chip, buf, len); @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) u32 interrupt; int i, rc; + if (storm_check) { + irqs_fired++; + + if (!time_start) { + time_start = jiffies_to_msecs(jiffies); + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { + __disable_interrupts(chip); + storm_check = false; + storm_killed = true; + return IRQ_HANDLED; + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { + storm_check = false; + } + } + rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); if (rc < 0) return IRQ_NONE;
Hi, On 11/19/20 7:36 AM, Jerry Snitselaar wrote: > > Matthew Garrett @ 2020-10-15 15:39 MST: > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >>> >>> There is a misconfiguration in the bios of the gpio pin used for the >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis >>> driver code this results in an interrupt storm. This was initially >>> reported when we attempted to enable the interrupt code in the tpm_tis >>> driver, which previously wasn't setting a flag to enable it. Due to >>> the reports of the interrupt storm that code was reverted and we went back >>> to polling instead of using interrupts. Now that we know the T490s problem >>> is a firmware issue, add code to check if the system is a T490s and >>> disable interrupts if that is the case. This will allow us to enable >>> interrupts for everyone else. If the user has a fixed bios they can >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the >>> kernel command line. >> >> I think an implication of this is that systems haven't been >> well-tested with interrupts enabled. In general when we've found a >> firmware issue in one place it ends up happening elsewhere as well, so >> it wouldn't surprise me if there are other machines that will also be >> unhappy with interrupts enabled. Would it be possible to automatically >> detect this case (eg, if we get more than a certain number of >> interrupts in a certain timeframe immediately after enabling the >> interrupt) and automatically fall back to polling in that case? It >> would also mean that users with fixed firmware wouldn't need to pass a >> parameter. > > I believe Matthew is correct here. I found another system today > with completely different vendor for both the system and the tpm chip. > In addition another Lenovo model, the L490, has the issue. > > This initial attempt at a solution like Matthew suggested works on > the system I found today, but I imagine it is all sorts of wrong. > In the 2 systems where I've seen it, there are about 100000 interrupts > in around 1.5 seconds, and then the irq code shuts down the interrupt > because they aren't being handled. Is that with your patch? The IRQ should be silenced as soon as devm_free_irq(chip->dev.parent, priv->irq, chip); is called. Depending on if we can get your storm-detection to work or not, we might also choose to just never try to use the IRQ (at least on x86 systems). AFAIK the TPM is never used for high-throughput stuff so the polling overhead should not be a big deal (and I'm getting the feeling that Windows always polls). Regards, Hans > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > index 49ae09ac604f..478e9d02a3fa 100644 > --- a/drivers/char/tpm/tpm_tis_core.c > +++ b/drivers/char/tpm/tpm_tis_core.c > @@ -27,6 +27,11 @@ > #include "tpm.h" > #include "tpm_tis_core.h" > > +static unsigned int time_start = 0; > +static bool storm_check = true; > +static bool storm_killed = false; > +static u32 irqs_fired = 0; > + > static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); > > static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) > @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > return rc; > } > > -static void disable_interrupts(struct tpm_chip *chip) > +static void __disable_interrupts(struct tpm_chip *chip) > { > struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > u32 intmask; > int rc; > > - if (priv->irq == 0) > - return; > - > rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > if (rc < 0) > intmask = 0; > > intmask &= ~TPM_GLOBAL_INT_ENABLE; > rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > +} > + > +static void disable_interrupts(struct tpm_chip *chip) > +{ > + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > > + if (priv->irq == 0) > + return; > + > + __disable_interrupts(chip); > devm_free_irq(chip->dev.parent, priv->irq, chip); > priv->irq = 0; > - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > } > > /* > @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) > int rc, irq; > struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > > + if (unlikely(storm_killed)) { > + devm_free_irq(chip->dev.parent, priv->irq, chip); > + priv->irq = 0; > + storm_killed = false; > + } > + > if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) > return tpm_tis_send_main(chip, buf, len); > > @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) > u32 interrupt; > int i, rc; > > + if (storm_check) { > + irqs_fired++; > + > + if (!time_start) { > + time_start = jiffies_to_msecs(jiffies); > + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { > + __disable_interrupts(chip); > + storm_check = false; > + storm_killed = true; > + return IRQ_HANDLED; > + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { > + storm_check = false; > + } > + } > + > rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); > if (rc < 0) > return IRQ_NONE; >
Hans de Goede @ 2020-11-19 07:42 MST: > Hi, > > On 11/19/20 7:36 AM, Jerry Snitselaar wrote: >> >> Matthew Garrett @ 2020-10-15 15:39 MST: >> >>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >>>> >>>> There is a misconfiguration in the bios of the gpio pin used for the >>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis >>>> driver code this results in an interrupt storm. This was initially >>>> reported when we attempted to enable the interrupt code in the tpm_tis >>>> driver, which previously wasn't setting a flag to enable it. Due to >>>> the reports of the interrupt storm that code was reverted and we went back >>>> to polling instead of using interrupts. Now that we know the T490s problem >>>> is a firmware issue, add code to check if the system is a T490s and >>>> disable interrupts if that is the case. This will allow us to enable >>>> interrupts for everyone else. If the user has a fixed bios they can >>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the >>>> kernel command line. >>> >>> I think an implication of this is that systems haven't been >>> well-tested with interrupts enabled. In general when we've found a >>> firmware issue in one place it ends up happening elsewhere as well, so >>> it wouldn't surprise me if there are other machines that will also be >>> unhappy with interrupts enabled. Would it be possible to automatically >>> detect this case (eg, if we get more than a certain number of >>> interrupts in a certain timeframe immediately after enabling the >>> interrupt) and automatically fall back to polling in that case? It >>> would also mean that users with fixed firmware wouldn't need to pass a >>> parameter. >> >> I believe Matthew is correct here. I found another system today >> with completely different vendor for both the system and the tpm chip. >> In addition another Lenovo model, the L490, has the issue. >> >> This initial attempt at a solution like Matthew suggested works on >> the system I found today, but I imagine it is all sorts of wrong. >> In the 2 systems where I've seen it, there are about 100000 interrupts >> in around 1.5 seconds, and then the irq code shuts down the interrupt >> because they aren't being handled. > > Is that with your patch? The IRQ should be silenced as soon as > devm_free_irq(chip->dev.parent, priv->irq, chip); is called. > No that is just with James' patchset that enables interrupts for tpm_tis. It looks like the irq is firing, but the tpm's int_status register is clear, so the handler immediately returns IRQ_NONE. After that happens 100000 times the core irq code shuts down the irq, but it isn't released so I could still see the stats in /proc/interrupts. With my attempt below on top of that patchset it releases the irq. I had to stick the check prior to it checking the int_status register because it is cleared and the handler returns, and I couldn't do the devm_free_irq directly in tis_int_handler, so I tried sticking it in tpm_tis_send where the other odd irq testing code was already located. I'm not sure if there is another place that would work better for calling the devm_free_irq. > Depending on if we can get your storm-detection to work or not, > we might also choose to just never try to use the IRQ (at least on > x86 systems). AFAIK the TPM is never used for high-throughput stuff > so the polling overhead should not be a big deal (and I'm getting the feeling > that Windows always polls). > I was wondering about Windows as well. In addition to the Lenovo systems which the T490s had Nuvoton tpm, the system I found yesterday was a development system we have from a partner with an Infineon tpm. Dan Williams has seen it internally at Intel as well on some system. > Regards, > > Hans > > > >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >> index 49ae09ac604f..478e9d02a3fa 100644 >> --- a/drivers/char/tpm/tpm_tis_core.c >> +++ b/drivers/char/tpm/tpm_tis_core.c >> @@ -27,6 +27,11 @@ >> #include "tpm.h" >> #include "tpm_tis_core.h" >> >> +static unsigned int time_start = 0; >> +static bool storm_check = true; >> +static bool storm_killed = false; >> +static u32 irqs_fired = 0; >> + >> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); >> >> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) >> return rc; >> } >> >> -static void disable_interrupts(struct tpm_chip *chip) >> +static void __disable_interrupts(struct tpm_chip *chip) >> { >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> u32 intmask; >> int rc; >> >> - if (priv->irq == 0) >> - return; >> - >> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >> if (rc < 0) >> intmask = 0; >> >> intmask &= ~TPM_GLOBAL_INT_ENABLE; >> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> +} >> + >> +static void disable_interrupts(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> >> + if (priv->irq == 0) >> + return; >> + >> + __disable_interrupts(chip); >> devm_free_irq(chip->dev.parent, priv->irq, chip); >> priv->irq = 0; >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> } >> >> /* >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >> int rc, irq; >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> >> + if (unlikely(storm_killed)) { >> + devm_free_irq(chip->dev.parent, priv->irq, chip); >> + priv->irq = 0; >> + storm_killed = false; >> + } >> + >> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) >> return tpm_tis_send_main(chip, buf, len); >> >> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) >> u32 interrupt; >> int i, rc; >> >> + if (storm_check) { >> + irqs_fired++; >> + >> + if (!time_start) { >> + time_start = jiffies_to_msecs(jiffies); >> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { >> + __disable_interrupts(chip); >> + storm_check = false; >> + storm_killed = true; >> + return IRQ_HANDLED; >> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { >> + storm_check = false; >> + } >> + } >> + >> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); >> if (rc < 0) >> return IRQ_NONE; >>
Hi, On 11/19/20 6:05 PM, Jerry Snitselaar wrote: > > Hans de Goede @ 2020-11-19 07:42 MST: > >> Hi, >> >> On 11/19/20 7:36 AM, Jerry Snitselaar wrote: >>> >>> Matthew Garrett @ 2020-10-15 15:39 MST: >>> >>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >>>>> >>>>> There is a misconfiguration in the bios of the gpio pin used for the >>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis >>>>> driver code this results in an interrupt storm. This was initially >>>>> reported when we attempted to enable the interrupt code in the tpm_tis >>>>> driver, which previously wasn't setting a flag to enable it. Due to >>>>> the reports of the interrupt storm that code was reverted and we went back >>>>> to polling instead of using interrupts. Now that we know the T490s problem >>>>> is a firmware issue, add code to check if the system is a T490s and >>>>> disable interrupts if that is the case. This will allow us to enable >>>>> interrupts for everyone else. If the user has a fixed bios they can >>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the >>>>> kernel command line. >>>> >>>> I think an implication of this is that systems haven't been >>>> well-tested with interrupts enabled. In general when we've found a >>>> firmware issue in one place it ends up happening elsewhere as well, so >>>> it wouldn't surprise me if there are other machines that will also be >>>> unhappy with interrupts enabled. Would it be possible to automatically >>>> detect this case (eg, if we get more than a certain number of >>>> interrupts in a certain timeframe immediately after enabling the >>>> interrupt) and automatically fall back to polling in that case? It >>>> would also mean that users with fixed firmware wouldn't need to pass a >>>> parameter. >>> >>> I believe Matthew is correct here. I found another system today >>> with completely different vendor for both the system and the tpm chip. >>> In addition another Lenovo model, the L490, has the issue. >>> >>> This initial attempt at a solution like Matthew suggested works on >>> the system I found today, but I imagine it is all sorts of wrong. >>> In the 2 systems where I've seen it, there are about 100000 interrupts >>> in around 1.5 seconds, and then the irq code shuts down the interrupt >>> because they aren't being handled. >> >> Is that with your patch? The IRQ should be silenced as soon as >> devm_free_irq(chip->dev.parent, priv->irq, chip); is called. >> > > No that is just with James' patchset that enables interrupts for > tpm_tis. It looks like the irq is firing, but the tpm's int_status > register is clear, so the handler immediately returns IRQ_NONE. After > that happens 100000 times the core irq code shuts down the irq, but it > isn't released so I could still see the stats in /proc/interrupts. I see, yes I have seen this behavior on the X1C8 with a pre-production BIOS. > With > my attempt below on top of that patchset it releases the irq. I had to > stick the check prior to it checking the int_status register because it > is cleared and the handler returns, Ack. > and I couldn't do the devm_free_irq > directly in tis_int_handler, so I tried sticking it in tpm_tis_send > where the other odd irq testing code was already located. I'm not sure > if there is another place that would work better for calling the > devm_free_irq. Adding it together with the other IRQ testing related code seems fine to me. >> Depending on if we can get your storm-detection to work or not, >> we might also choose to just never try to use the IRQ (at least on >> x86 systems). AFAIK the TPM is never used for high-throughput stuff >> so the polling overhead should not be a big deal (and I'm getting the feeling >> that Windows always polls). >> > > I was wondering about Windows as well. In addition to the Lenovo systems > which the T490s had Nuvoton tpm, the system I found yesterday was a development > system we have from a partner with an Infineon tpm. Dan Williams has > seen it internally at Intel as well on some system. Sounds to me like Windows never uses the IRQ, so we get the fun of finding all the untested firmware bugs. So if we cannot get this detection code to work reliable, maybe we should just not use the IRQ at all, at least on X86 + ACPI systems? Anyways lets try this storm-detection thing first, but I have the feeling we should not spend too much time on this. Just outright disabling IRQ support might be better. REgards, Hans
On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: > > Matthew Garrett @ 2020-10-15 15:39 MST: > > > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > >> > >> There is a misconfiguration in the bios of the gpio pin used for the > >> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >> driver code this results in an interrupt storm. This was initially > >> reported when we attempted to enable the interrupt code in the tpm_tis > >> driver, which previously wasn't setting a flag to enable it. Due to > >> the reports of the interrupt storm that code was reverted and we went back > >> to polling instead of using interrupts. Now that we know the T490s problem > >> is a firmware issue, add code to check if the system is a T490s and > >> disable interrupts if that is the case. This will allow us to enable > >> interrupts for everyone else. If the user has a fixed bios they can > >> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >> kernel command line. > > > > I think an implication of this is that systems haven't been > > well-tested with interrupts enabled. In general when we've found a > > firmware issue in one place it ends up happening elsewhere as well, so > > it wouldn't surprise me if there are other machines that will also be > > unhappy with interrupts enabled. Would it be possible to automatically > > detect this case (eg, if we get more than a certain number of > > interrupts in a certain timeframe immediately after enabling the > > interrupt) and automatically fall back to polling in that case? It > > would also mean that users with fixed firmware wouldn't need to pass a > > parameter. > > I believe Matthew is correct here. I found another system today > with completely different vendor for both the system and the tpm chip. > In addition another Lenovo model, the L490, has the issue. > > This initial attempt at a solution like Matthew suggested works on > the system I found today, but I imagine it is all sorts of wrong. > In the 2 systems where I've seen it, there are about 100000 interrupts > in around 1.5 seconds, and then the irq code shuts down the interrupt > because they aren't being handled. > > > diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > index 49ae09ac604f..478e9d02a3fa 100644 > --- a/drivers/char/tpm/tpm_tis_core.c > +++ b/drivers/char/tpm/tpm_tis_core.c > @@ -27,6 +27,11 @@ > #include "tpm.h" > #include "tpm_tis_core.h" > > +static unsigned int time_start = 0; > +static bool storm_check = true; > +static bool storm_killed = false; > +static u32 irqs_fired = 0; Maybe kstat_irqs() would be a better idea than ad hoc stats. > + > static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); > > static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) > @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > return rc; > } > > -static void disable_interrupts(struct tpm_chip *chip) > +static void __disable_interrupts(struct tpm_chip *chip) > { > struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > u32 intmask; > int rc; > > - if (priv->irq == 0) > - return; > - > rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > if (rc < 0) > intmask = 0; > > intmask &= ~TPM_GLOBAL_INT_ENABLE; > rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > +} > + > +static void disable_interrupts(struct tpm_chip *chip) > +{ > + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > > + if (priv->irq == 0) > + return; > + > + __disable_interrupts(chip); > devm_free_irq(chip->dev.parent, priv->irq, chip); > priv->irq = 0; > - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > } > > /* > @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) > int rc, irq; > struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > > + if (unlikely(storm_killed)) { > + devm_free_irq(chip->dev.parent, priv->irq, chip); > + priv->irq = 0; > + storm_killed = false; > + } OK this kind of bad solution because if tpm_tis_send() is not called, then IRQ is never freed. AFAIK, devres_* do not sleep but use spin lock, i.e. you could render out both storm_check and storm_killed. > + > if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) > return tpm_tis_send_main(chip, buf, len); > > @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) > u32 interrupt; > int i, rc; > > + if (storm_check) { > + irqs_fired++; > + > + if (!time_start) { > + time_start = jiffies_to_msecs(jiffies); > + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { > + __disable_interrupts(chip); > + storm_check = false; > + storm_killed = true; > + return IRQ_HANDLED; > + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { > + storm_check = false; > + } > + } > + > rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); > if (rc < 0) > return IRQ_NONE; > > /Jarkko
On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote: > Hi, > > On 11/19/20 7:36 AM, Jerry Snitselaar wrote: > > > > Matthew Garrett @ 2020-10-15 15:39 MST: > > > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > >>> > >>> There is a misconfiguration in the bios of the gpio pin used for the > >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >>> driver code this results in an interrupt storm. This was initially > >>> reported when we attempted to enable the interrupt code in the tpm_tis > >>> driver, which previously wasn't setting a flag to enable it. Due to > >>> the reports of the interrupt storm that code was reverted and we went back > >>> to polling instead of using interrupts. Now that we know the T490s problem > >>> is a firmware issue, add code to check if the system is a T490s and > >>> disable interrupts if that is the case. This will allow us to enable > >>> interrupts for everyone else. If the user has a fixed bios they can > >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >>> kernel command line. > >> > >> I think an implication of this is that systems haven't been > >> well-tested with interrupts enabled. In general when we've found a > >> firmware issue in one place it ends up happening elsewhere as well, so > >> it wouldn't surprise me if there are other machines that will also be > >> unhappy with interrupts enabled. Would it be possible to automatically > >> detect this case (eg, if we get more than a certain number of > >> interrupts in a certain timeframe immediately after enabling the > >> interrupt) and automatically fall back to polling in that case? It > >> would also mean that users with fixed firmware wouldn't need to pass a > >> parameter. > > > > I believe Matthew is correct here. I found another system today > > with completely different vendor for both the system and the tpm chip. > > In addition another Lenovo model, the L490, has the issue. > > > > This initial attempt at a solution like Matthew suggested works on > > the system I found today, but I imagine it is all sorts of wrong. > > In the 2 systems where I've seen it, there are about 100000 interrupts > > in around 1.5 seconds, and then the irq code shuts down the interrupt > > because they aren't being handled. > > Is that with your patch? The IRQ should be silenced as soon as > devm_free_irq(chip->dev.parent, priv->irq, chip); is called. > > Depending on if we can get your storm-detection to work or not, > we might also choose to just never try to use the IRQ (at least on > x86 systems). AFAIK the TPM is never used for high-throughput stuff > so the polling overhead should not be a big deal (and I'm getting the feeling > that Windows always polls). > > Regards, > > Hans Yeah, this is what I've been wondering for a while. Why could not we just strip off IRQ code? Why does it matter? /Jarkko
On Tue, Nov 24, 2020 at 05:27:30AM +0200, Jarkko Sakkinen wrote: > On Thu, Nov 19, 2020 at 03:42:35PM +0100, Hans de Goede wrote: > > Hi, > > > > On 11/19/20 7:36 AM, Jerry Snitselaar wrote: > > > > > > Matthew Garrett @ 2020-10-15 15:39 MST: > > > > > >> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > > >>> > > >>> There is a misconfiguration in the bios of the gpio pin used for the > > >>> interrupt in the T490s. When interrupts are enabled in the tpm_tis > > >>> driver code this results in an interrupt storm. This was initially > > >>> reported when we attempted to enable the interrupt code in the tpm_tis > > >>> driver, which previously wasn't setting a flag to enable it. Due to > > >>> the reports of the interrupt storm that code was reverted and we went back > > >>> to polling instead of using interrupts. Now that we know the T490s problem > > >>> is a firmware issue, add code to check if the system is a T490s and > > >>> disable interrupts if that is the case. This will allow us to enable > > >>> interrupts for everyone else. If the user has a fixed bios they can > > >>> force the enabling of interrupts with tpm_tis.interrupts=1 on the > > >>> kernel command line. > > >> > > >> I think an implication of this is that systems haven't been > > >> well-tested with interrupts enabled. In general when we've found a > > >> firmware issue in one place it ends up happening elsewhere as well, so > > >> it wouldn't surprise me if there are other machines that will also be > > >> unhappy with interrupts enabled. Would it be possible to automatically > > >> detect this case (eg, if we get more than a certain number of > > >> interrupts in a certain timeframe immediately after enabling the > > >> interrupt) and automatically fall back to polling in that case? It > > >> would also mean that users with fixed firmware wouldn't need to pass a > > >> parameter. > > > > > > I believe Matthew is correct here. I found another system today > > > with completely different vendor for both the system and the tpm chip. > > > In addition another Lenovo model, the L490, has the issue. > > > > > > This initial attempt at a solution like Matthew suggested works on > > > the system I found today, but I imagine it is all sorts of wrong. > > > In the 2 systems where I've seen it, there are about 100000 interrupts > > > in around 1.5 seconds, and then the irq code shuts down the interrupt > > > because they aren't being handled. > > > > Is that with your patch? The IRQ should be silenced as soon as > > devm_free_irq(chip->dev.parent, priv->irq, chip); is called. > > > > Depending on if we can get your storm-detection to work or not, > > we might also choose to just never try to use the IRQ (at least on > > x86 systems). AFAIK the TPM is never used for high-throughput stuff > > so the polling overhead should not be a big deal (and I'm getting the feeling > > that Windows always polls). > > > > Regards, > > > > Hans > > Yeah, this is what I've been wondering for a while. Why could not we > just strip off IRQ code? Why does it matter? And we DO NOT use interrupts in tpm_crb and nobody has ever complained. /Jarkko
Jarkko Sakkinen @ 2020-11-23 20:26 MST: > On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: >> >> Matthew Garrett @ 2020-10-15 15:39 MST: >> >> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >> >> >> >> There is a misconfiguration in the bios of the gpio pin used for the >> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis >> >> driver code this results in an interrupt storm. This was initially >> >> reported when we attempted to enable the interrupt code in the tpm_tis >> >> driver, which previously wasn't setting a flag to enable it. Due to >> >> the reports of the interrupt storm that code was reverted and we went back >> >> to polling instead of using interrupts. Now that we know the T490s problem >> >> is a firmware issue, add code to check if the system is a T490s and >> >> disable interrupts if that is the case. This will allow us to enable >> >> interrupts for everyone else. If the user has a fixed bios they can >> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the >> >> kernel command line. >> > >> > I think an implication of this is that systems haven't been >> > well-tested with interrupts enabled. In general when we've found a >> > firmware issue in one place it ends up happening elsewhere as well, so >> > it wouldn't surprise me if there are other machines that will also be >> > unhappy with interrupts enabled. Would it be possible to automatically >> > detect this case (eg, if we get more than a certain number of >> > interrupts in a certain timeframe immediately after enabling the >> > interrupt) and automatically fall back to polling in that case? It >> > would also mean that users with fixed firmware wouldn't need to pass a >> > parameter. >> >> I believe Matthew is correct here. I found another system today >> with completely different vendor for both the system and the tpm chip. >> In addition another Lenovo model, the L490, has the issue. >> >> This initial attempt at a solution like Matthew suggested works on >> the system I found today, but I imagine it is all sorts of wrong. >> In the 2 systems where I've seen it, there are about 100000 interrupts >> in around 1.5 seconds, and then the irq code shuts down the interrupt >> because they aren't being handled. >> >> >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >> index 49ae09ac604f..478e9d02a3fa 100644 >> --- a/drivers/char/tpm/tpm_tis_core.c >> +++ b/drivers/char/tpm/tpm_tis_core.c >> @@ -27,6 +27,11 @@ >> #include "tpm.h" >> #include "tpm_tis_core.h" >> >> +static unsigned int time_start = 0; >> +static bool storm_check = true; >> +static bool storm_killed = false; >> +static u32 irqs_fired = 0; > > Maybe kstat_irqs() would be a better idea than ad hoc stats. > Thanks, yes that would be better. >> + >> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); >> >> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) >> return rc; >> } >> >> -static void disable_interrupts(struct tpm_chip *chip) >> +static void __disable_interrupts(struct tpm_chip *chip) >> { >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> u32 intmask; >> int rc; >> >> - if (priv->irq == 0) >> - return; >> - >> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >> if (rc < 0) >> intmask = 0; >> >> intmask &= ~TPM_GLOBAL_INT_ENABLE; >> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> +} >> + >> +static void disable_interrupts(struct tpm_chip *chip) >> +{ >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> >> + if (priv->irq == 0) >> + return; >> + >> + __disable_interrupts(chip); >> devm_free_irq(chip->dev.parent, priv->irq, chip); >> priv->irq = 0; >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; >> } >> >> /* >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >> int rc, irq; >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >> >> + if (unlikely(storm_killed)) { >> + devm_free_irq(chip->dev.parent, priv->irq, chip); >> + priv->irq = 0; >> + storm_killed = false; >> + } > > OK this kind of bad solution because if tpm_tis_send() is not called, > then IRQ is never freed. AFAIK, devres_* do not sleep but use spin > lock, i.e. you could render out both storm_check and storm_killed. > Is there a way to flag it for freeing later while in an interrupt context? I'm not sure where to clean it up since devm_free_irq can't be called in tis_int_handler. Before diving further into that though, does anyone else have an opinion on ripping out the irq code, and just using polling? We've been only polling since 2015 anyways. >> + >> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) >> return tpm_tis_send_main(chip, buf, len); >> >> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) >> u32 interrupt; >> int i, rc; >> >> + if (storm_check) { >> + irqs_fired++; >> + >> + if (!time_start) { >> + time_start = jiffies_to_msecs(jiffies); >> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { >> + __disable_interrupts(chip); >> + storm_check = false; >> + storm_killed = true; >> + return IRQ_HANDLED; >> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { >> + storm_check = false; >> + } >> + } >> + >> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); >> if (rc < 0) >> return IRQ_NONE; >> >> > > /Jarkko
On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote: > Before diving further into that though, does anyone else have an > opinion on ripping out the irq code, and just using polling? We've > been only polling since 2015 anyways. Well only a biased one, obviously: polling causes large amounts of busy waiting, which is a waste of CPU resources and does increase the time it takes us to do TPM operations ... not a concern if you're doing long computation ones, like signatures, but it is a problem for short operations like bulk updates of PCRs. The other potential issue, as we saw with atmel is that if you prod the chip too often (which you have to do with polling) you risk upsetting it. We've spent ages trying to tune the polling parameters to balance reduction of busy wait with chip upset and still, apparently, not quite got it right. If the TPM has a functioning IRQ then it gets us out of the whole polling mess entirely. The big question is how many chips that report an IRQ actually have a malfunctioning one? James
Hi, On 11/24/20 6:52 PM, Jerry Snitselaar wrote: > > Jarkko Sakkinen @ 2020-11-23 20:26 MST: > >> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: >>> >>> Matthew Garrett @ 2020-10-15 15:39 MST: >>> >>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >>>>> >>>>> There is a misconfiguration in the bios of the gpio pin used for the >>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis >>>>> driver code this results in an interrupt storm. This was initially >>>>> reported when we attempted to enable the interrupt code in the tpm_tis >>>>> driver, which previously wasn't setting a flag to enable it. Due to >>>>> the reports of the interrupt storm that code was reverted and we went back >>>>> to polling instead of using interrupts. Now that we know the T490s problem >>>>> is a firmware issue, add code to check if the system is a T490s and >>>>> disable interrupts if that is the case. This will allow us to enable >>>>> interrupts for everyone else. If the user has a fixed bios they can >>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the >>>>> kernel command line. >>>> >>>> I think an implication of this is that systems haven't been >>>> well-tested with interrupts enabled. In general when we've found a >>>> firmware issue in one place it ends up happening elsewhere as well, so >>>> it wouldn't surprise me if there are other machines that will also be >>>> unhappy with interrupts enabled. Would it be possible to automatically >>>> detect this case (eg, if we get more than a certain number of >>>> interrupts in a certain timeframe immediately after enabling the >>>> interrupt) and automatically fall back to polling in that case? It >>>> would also mean that users with fixed firmware wouldn't need to pass a >>>> parameter. >>> >>> I believe Matthew is correct here. I found another system today >>> with completely different vendor for both the system and the tpm chip. >>> In addition another Lenovo model, the L490, has the issue. >>> >>> This initial attempt at a solution like Matthew suggested works on >>> the system I found today, but I imagine it is all sorts of wrong. >>> In the 2 systems where I've seen it, there are about 100000 interrupts >>> in around 1.5 seconds, and then the irq code shuts down the interrupt >>> because they aren't being handled. >>> >>> >>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >>> index 49ae09ac604f..478e9d02a3fa 100644 >>> --- a/drivers/char/tpm/tpm_tis_core.c >>> +++ b/drivers/char/tpm/tpm_tis_core.c >>> @@ -27,6 +27,11 @@ >>> #include "tpm.h" >>> #include "tpm_tis_core.h" >>> >>> +static unsigned int time_start = 0; >>> +static bool storm_check = true; >>> +static bool storm_killed = false; >>> +static u32 irqs_fired = 0; >> >> Maybe kstat_irqs() would be a better idea than ad hoc stats. >> > > Thanks, yes that would be better. > >>> + >>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); >>> >>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) >>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) >>> return rc; >>> } >>> >>> -static void disable_interrupts(struct tpm_chip *chip) >>> +static void __disable_interrupts(struct tpm_chip *chip) >>> { >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>> u32 intmask; >>> int rc; >>> >>> - if (priv->irq == 0) >>> - return; >>> - >>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >>> if (rc < 0) >>> intmask = 0; >>> >>> intmask &= ~TPM_GLOBAL_INT_ENABLE; >>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >>> +} >>> + >>> +static void disable_interrupts(struct tpm_chip *chip) >>> +{ >>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>> >>> + if (priv->irq == 0) >>> + return; >>> + >>> + __disable_interrupts(chip); >>> devm_free_irq(chip->dev.parent, priv->irq, chip); >>> priv->irq = 0; >>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; >>> } >>> >>> /* >>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >>> int rc, irq; >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>> >>> + if (unlikely(storm_killed)) { >>> + devm_free_irq(chip->dev.parent, priv->irq, chip); >>> + priv->irq = 0; >>> + storm_killed = false; >>> + } >> >> OK this kind of bad solution because if tpm_tis_send() is not called, >> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin >> lock, i.e. you could render out both storm_check and storm_killed. >> > > Is there a way to flag it for freeing later while in an interrupt > context? I'm not sure where to clean it up since devm_free_irq can't be > called in tis_int_handler. You could add a workqueue work-struct just for this and queue that up to do the free when you detect the storm. That will then run pretty much immediately, avoiding the storm going on for (much) longer. > Before diving further into that though, does anyone else have an opinion > on ripping out the irq code, and just using polling? We've been only > polling since 2015 anyways. Given James Bottomley's reply I guess it would be worthwhile to get the storm detection to work. Regards, Hans > >>> + >>> if (!(chip->flags & TPM_CHIP_FLAG_IRQ) || priv->irq_tested) >>> return tpm_tis_send_main(chip, buf, len); >>> >>> @@ -748,6 +765,21 @@ static irqreturn_t tis_int_handler(int dummy, void *dev_id) >>> u32 interrupt; >>> int i, rc; >>> >>> + if (storm_check) { >>> + irqs_fired++; >>> + >>> + if (!time_start) { >>> + time_start = jiffies_to_msecs(jiffies); >>> + } else if ((irqs_fired > 1000) && (jiffies_to_msecs(jiffies) - jiffies < 500)) { >>> + __disable_interrupts(chip); >>> + storm_check = false; >>> + storm_killed = true; >>> + return IRQ_HANDLED; >>> + } else if ((jiffies_to_msecs(jiffies) - time_start > 500) && (irqs_fired < 1000)) { >>> + storm_check = false; >>> + } >>> + } >>> + >>> rc = tpm_tis_read32(priv, TPM_INT_STATUS(priv->locality), &interrupt); >>> if (rc < 0) >>> return IRQ_NONE; >>> >>> >> >> /Jarkko >
On Tue, Nov 24, 2020 at 10:52:56AM -0700, Jerry Snitselaar wrote: > > Jarkko Sakkinen @ 2020-11-23 20:26 MST: > > > On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: > >> > >> Matthew Garrett @ 2020-10-15 15:39 MST: > >> > >> > On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > >> >> > >> >> There is a misconfiguration in the bios of the gpio pin used for the > >> >> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >> >> driver code this results in an interrupt storm. This was initially > >> >> reported when we attempted to enable the interrupt code in the tpm_tis > >> >> driver, which previously wasn't setting a flag to enable it. Due to > >> >> the reports of the interrupt storm that code was reverted and we went back > >> >> to polling instead of using interrupts. Now that we know the T490s problem > >> >> is a firmware issue, add code to check if the system is a T490s and > >> >> disable interrupts if that is the case. This will allow us to enable > >> >> interrupts for everyone else. If the user has a fixed bios they can > >> >> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >> >> kernel command line. > >> > > >> > I think an implication of this is that systems haven't been > >> > well-tested with interrupts enabled. In general when we've found a > >> > firmware issue in one place it ends up happening elsewhere as well, so > >> > it wouldn't surprise me if there are other machines that will also be > >> > unhappy with interrupts enabled. Would it be possible to automatically > >> > detect this case (eg, if we get more than a certain number of > >> > interrupts in a certain timeframe immediately after enabling the > >> > interrupt) and automatically fall back to polling in that case? It > >> > would also mean that users with fixed firmware wouldn't need to pass a > >> > parameter. > >> > >> I believe Matthew is correct here. I found another system today > >> with completely different vendor for both the system and the tpm chip. > >> In addition another Lenovo model, the L490, has the issue. > >> > >> This initial attempt at a solution like Matthew suggested works on > >> the system I found today, but I imagine it is all sorts of wrong. > >> In the 2 systems where I've seen it, there are about 100000 interrupts > >> in around 1.5 seconds, and then the irq code shuts down the interrupt > >> because they aren't being handled. > >> > >> > >> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > >> index 49ae09ac604f..478e9d02a3fa 100644 > >> --- a/drivers/char/tpm/tpm_tis_core.c > >> +++ b/drivers/char/tpm/tpm_tis_core.c > >> @@ -27,6 +27,11 @@ > >> #include "tpm.h" > >> #include "tpm_tis_core.h" > >> > >> +static unsigned int time_start = 0; > >> +static bool storm_check = true; > >> +static bool storm_killed = false; > >> +static u32 irqs_fired = 0; > > > > Maybe kstat_irqs() would be a better idea than ad hoc stats. > > > > Thanks, yes that would be better. > > >> + > >> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); > >> > >> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) > >> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > >> return rc; > >> } > >> > >> -static void disable_interrupts(struct tpm_chip *chip) > >> +static void __disable_interrupts(struct tpm_chip *chip) > >> { > >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> u32 intmask; > >> int rc; > >> > >> - if (priv->irq == 0) > >> - return; > >> - > >> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > >> if (rc < 0) > >> intmask = 0; > >> > >> intmask &= ~TPM_GLOBAL_INT_ENABLE; > >> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > >> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >> +} > >> + > >> +static void disable_interrupts(struct tpm_chip *chip) > >> +{ > >> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> > >> + if (priv->irq == 0) > >> + return; > >> + > >> + __disable_interrupts(chip); > >> devm_free_irq(chip->dev.parent, priv->irq, chip); > >> priv->irq = 0; > >> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >> } > >> > >> /* > >> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) > >> int rc, irq; > >> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >> > >> + if (unlikely(storm_killed)) { > >> + devm_free_irq(chip->dev.parent, priv->irq, chip); > >> + priv->irq = 0; > >> + storm_killed = false; > >> + } > > > > OK this kind of bad solution because if tpm_tis_send() is not called, > > then IRQ is never freed. AFAIK, devres_* do not sleep but use spin > > lock, i.e. you could render out both storm_check and storm_killed. > > > > Is there a way to flag it for freeing later while in an interrupt > context? I'm not sure where to clean it up since devm_free_irq can't be > called in tis_int_handler. > > Before diving further into that though, does anyone else have an opinion > on ripping out the irq code, and just using polling? We've been only > polling since 2015 anyways. Given these all these issues, it's quite obvious that Windows side is just polling. I'll ack a patch that removes the IRQ code for sure. /Jarkko
On Tue, Nov 24, 2020 at 10:10:21AM -0800, James Bottomley wrote: > On Tue, 2020-11-24 at 10:52 -0700, Jerry Snitselaar wrote: > > Before diving further into that though, does anyone else have an > > opinion on ripping out the irq code, and just using polling? We've > > been only polling since 2015 anyways. > > Well only a biased one, obviously: polling causes large amounts of busy > waiting, which is a waste of CPU resources and does increase the time > it takes us to do TPM operations ... not a concern if you're doing long > computation ones, like signatures, but it is a problem for short > operations like bulk updates of PCRs. The other potential issue, as we > saw with atmel is that if you prod the chip too often (which you have > to do with polling) you risk upsetting it. We've spent ages trying to > tune the polling parameters to balance reduction of busy wait with chip > upset and still, apparently, not quite got it right. If the TPM has a > functioning IRQ then it gets us out of the whole polling mess entirely. > The big question is how many chips that report an IRQ actually have a > malfunctioning one? > > James Do we have a way to know is Windows TPM code using IRQ's? /Jarkko
On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote: > Hi, > > On 11/24/20 6:52 PM, Jerry Snitselaar wrote: > > > > Jarkko Sakkinen @ 2020-11-23 20:26 MST: > > > >> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: > >>> > >>> Matthew Garrett @ 2020-10-15 15:39 MST: > >>> > >>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > >>>>> > >>>>> There is a misconfiguration in the bios of the gpio pin used for the > >>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >>>>> driver code this results in an interrupt storm. This was initially > >>>>> reported when we attempted to enable the interrupt code in the tpm_tis > >>>>> driver, which previously wasn't setting a flag to enable it. Due to > >>>>> the reports of the interrupt storm that code was reverted and we went back > >>>>> to polling instead of using interrupts. Now that we know the T490s problem > >>>>> is a firmware issue, add code to check if the system is a T490s and > >>>>> disable interrupts if that is the case. This will allow us to enable > >>>>> interrupts for everyone else. If the user has a fixed bios they can > >>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >>>>> kernel command line. > >>>> > >>>> I think an implication of this is that systems haven't been > >>>> well-tested with interrupts enabled. In general when we've found a > >>>> firmware issue in one place it ends up happening elsewhere as well, so > >>>> it wouldn't surprise me if there are other machines that will also be > >>>> unhappy with interrupts enabled. Would it be possible to automatically > >>>> detect this case (eg, if we get more than a certain number of > >>>> interrupts in a certain timeframe immediately after enabling the > >>>> interrupt) and automatically fall back to polling in that case? It > >>>> would also mean that users with fixed firmware wouldn't need to pass a > >>>> parameter. > >>> > >>> I believe Matthew is correct here. I found another system today > >>> with completely different vendor for both the system and the tpm chip. > >>> In addition another Lenovo model, the L490, has the issue. > >>> > >>> This initial attempt at a solution like Matthew suggested works on > >>> the system I found today, but I imagine it is all sorts of wrong. > >>> In the 2 systems where I've seen it, there are about 100000 interrupts > >>> in around 1.5 seconds, and then the irq code shuts down the interrupt > >>> because they aren't being handled. > >>> > >>> > >>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > >>> index 49ae09ac604f..478e9d02a3fa 100644 > >>> --- a/drivers/char/tpm/tpm_tis_core.c > >>> +++ b/drivers/char/tpm/tpm_tis_core.c > >>> @@ -27,6 +27,11 @@ > >>> #include "tpm.h" > >>> #include "tpm_tis_core.h" > >>> > >>> +static unsigned int time_start = 0; > >>> +static bool storm_check = true; > >>> +static bool storm_killed = false; > >>> +static u32 irqs_fired = 0; > >> > >> Maybe kstat_irqs() would be a better idea than ad hoc stats. > >> > > > > Thanks, yes that would be better. > > > >>> + > >>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); > >>> > >>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) > >>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > >>> return rc; > >>> } > >>> > >>> -static void disable_interrupts(struct tpm_chip *chip) > >>> +static void __disable_interrupts(struct tpm_chip *chip) > >>> { > >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>> u32 intmask; > >>> int rc; > >>> > >>> - if (priv->irq == 0) > >>> - return; > >>> - > >>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > >>> if (rc < 0) > >>> intmask = 0; > >>> > >>> intmask &= ~TPM_GLOBAL_INT_ENABLE; > >>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > >>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >>> +} > >>> + > >>> +static void disable_interrupts(struct tpm_chip *chip) > >>> +{ > >>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>> > >>> + if (priv->irq == 0) > >>> + return; > >>> + > >>> + __disable_interrupts(chip); > >>> devm_free_irq(chip->dev.parent, priv->irq, chip); > >>> priv->irq = 0; > >>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >>> } > >>> > >>> /* > >>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) > >>> int rc, irq; > >>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>> > >>> + if (unlikely(storm_killed)) { > >>> + devm_free_irq(chip->dev.parent, priv->irq, chip); > >>> + priv->irq = 0; > >>> + storm_killed = false; > >>> + } > >> > >> OK this kind of bad solution because if tpm_tis_send() is not called, > >> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin > >> lock, i.e. you could render out both storm_check and storm_killed. > >> > > > > Is there a way to flag it for freeing later while in an interrupt > > context? I'm not sure where to clean it up since devm_free_irq can't be > > called in tis_int_handler. > > You could add a workqueue work-struct just for this and queue that up > to do the free when you detect the storm. That will then run pretty much > immediately, avoiding the storm going on for (much) longer. That's sounds feasible. > > Before diving further into that though, does anyone else have an opinion > > on ripping out the irq code, and just using polling? We've been only > > polling since 2015 anyways. > > Given James Bottomley's reply I guess it would be worthwhile to get the > storm detection to work. OK, agreed. I take my words back from a response few minutes ago :-) > Regards, > > Hans /Jarkko
Hi All, On 11/29/20 4:23 AM, Jarkko Sakkinen wrote: > On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote: >> Hi, >> >> On 11/24/20 6:52 PM, Jerry Snitselaar wrote: >>> >>> Jarkko Sakkinen @ 2020-11-23 20:26 MST: >>> >>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: >>>>> >>>>> Matthew Garrett @ 2020-10-15 15:39 MST: >>>>> >>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: >>>>>>> >>>>>>> There is a misconfiguration in the bios of the gpio pin used for the >>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis >>>>>>> driver code this results in an interrupt storm. This was initially >>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis >>>>>>> driver, which previously wasn't setting a flag to enable it. Due to >>>>>>> the reports of the interrupt storm that code was reverted and we went back >>>>>>> to polling instead of using interrupts. Now that we know the T490s problem >>>>>>> is a firmware issue, add code to check if the system is a T490s and >>>>>>> disable interrupts if that is the case. This will allow us to enable >>>>>>> interrupts for everyone else. If the user has a fixed bios they can >>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the >>>>>>> kernel command line. >>>>>> >>>>>> I think an implication of this is that systems haven't been >>>>>> well-tested with interrupts enabled. In general when we've found a >>>>>> firmware issue in one place it ends up happening elsewhere as well, so >>>>>> it wouldn't surprise me if there are other machines that will also be >>>>>> unhappy with interrupts enabled. Would it be possible to automatically >>>>>> detect this case (eg, if we get more than a certain number of >>>>>> interrupts in a certain timeframe immediately after enabling the >>>>>> interrupt) and automatically fall back to polling in that case? It >>>>>> would also mean that users with fixed firmware wouldn't need to pass a >>>>>> parameter. >>>>> >>>>> I believe Matthew is correct here. I found another system today >>>>> with completely different vendor for both the system and the tpm chip. >>>>> In addition another Lenovo model, the L490, has the issue. >>>>> >>>>> This initial attempt at a solution like Matthew suggested works on >>>>> the system I found today, but I imagine it is all sorts of wrong. >>>>> In the 2 systems where I've seen it, there are about 100000 interrupts >>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt >>>>> because they aren't being handled. >>>>> >>>>> >>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >>>>> index 49ae09ac604f..478e9d02a3fa 100644 >>>>> --- a/drivers/char/tpm/tpm_tis_core.c >>>>> +++ b/drivers/char/tpm/tpm_tis_core.c >>>>> @@ -27,6 +27,11 @@ >>>>> #include "tpm.h" >>>>> #include "tpm_tis_core.h" >>>>> >>>>> +static unsigned int time_start = 0; >>>>> +static bool storm_check = true; >>>>> +static bool storm_killed = false; >>>>> +static u32 irqs_fired = 0; >>>> >>>> Maybe kstat_irqs() would be a better idea than ad hoc stats. >>>> >>> >>> Thanks, yes that would be better. >>> >>>>> + >>>>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); >>>>> >>>>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) >>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) >>>>> return rc; >>>>> } >>>>> >>>>> -static void disable_interrupts(struct tpm_chip *chip) >>>>> +static void __disable_interrupts(struct tpm_chip *chip) >>>>> { >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>>>> u32 intmask; >>>>> int rc; >>>>> >>>>> - if (priv->irq == 0) >>>>> - return; >>>>> - >>>>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); >>>>> if (rc < 0) >>>>> intmask = 0; >>>>> >>>>> intmask &= ~TPM_GLOBAL_INT_ENABLE; >>>>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); >>>>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; >>>>> +} >>>>> + >>>>> +static void disable_interrupts(struct tpm_chip *chip) >>>>> +{ >>>>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>>>> >>>>> + if (priv->irq == 0) >>>>> + return; >>>>> + >>>>> + __disable_interrupts(chip); >>>>> devm_free_irq(chip->dev.parent, priv->irq, chip); >>>>> priv->irq = 0; >>>>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; >>>>> } >>>>> >>>>> /* >>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) >>>>> int rc, irq; >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); >>>>> >>>>> + if (unlikely(storm_killed)) { >>>>> + devm_free_irq(chip->dev.parent, priv->irq, chip); >>>>> + priv->irq = 0; >>>>> + storm_killed = false; >>>>> + } >>>> >>>> OK this kind of bad solution because if tpm_tis_send() is not called, >>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin >>>> lock, i.e. you could render out both storm_check and storm_killed. >>>> >>> >>> Is there a way to flag it for freeing later while in an interrupt >>> context? I'm not sure where to clean it up since devm_free_irq can't be >>> called in tis_int_handler. >> >> You could add a workqueue work-struct just for this and queue that up >> to do the free when you detect the storm. That will then run pretty much >> immediately, avoiding the storm going on for (much) longer. > > That's sounds feasible. > >>> Before diving further into that though, does anyone else have an opinion >>> on ripping out the irq code, and just using polling? We've been only >>> polling since 2015 anyways. >> >> Given James Bottomley's reply I guess it would be worthwhile to get the >> storm detection to work. > > OK, agreed. I take my words back from a response few minutes ago :-) :) To be clear, I think we should give the storm detection a go. Especially given the problems which James has seen with polling on some TPMs. But if that turns out to not be feasible I agree we should just either disable IRQs by default on standard x86 platforms, or just remove the IRQ support all together. Regards, Hans
On Sun, Nov 29, 2020 at 12:34:34PM +0100, Hans de Goede wrote: > Hi All, > > On 11/29/20 4:23 AM, Jarkko Sakkinen wrote: > > On Tue, Nov 24, 2020 at 10:45:01PM +0100, Hans de Goede wrote: > >> Hi, > >> > >> On 11/24/20 6:52 PM, Jerry Snitselaar wrote: > >>> > >>> Jarkko Sakkinen @ 2020-11-23 20:26 MST: > >>> > >>>> On Wed, Nov 18, 2020 at 11:36:20PM -0700, Jerry Snitselaar wrote: > >>>>> > >>>>> Matthew Garrett @ 2020-10-15 15:39 MST: > >>>>> > >>>>>> On Thu, Oct 15, 2020 at 2:44 PM Jerry Snitselaar <jsnitsel@redhat.com> wrote: > >>>>>>> > >>>>>>> There is a misconfiguration in the bios of the gpio pin used for the > >>>>>>> interrupt in the T490s. When interrupts are enabled in the tpm_tis > >>>>>>> driver code this results in an interrupt storm. This was initially > >>>>>>> reported when we attempted to enable the interrupt code in the tpm_tis > >>>>>>> driver, which previously wasn't setting a flag to enable it. Due to > >>>>>>> the reports of the interrupt storm that code was reverted and we went back > >>>>>>> to polling instead of using interrupts. Now that we know the T490s problem > >>>>>>> is a firmware issue, add code to check if the system is a T490s and > >>>>>>> disable interrupts if that is the case. This will allow us to enable > >>>>>>> interrupts for everyone else. If the user has a fixed bios they can > >>>>>>> force the enabling of interrupts with tpm_tis.interrupts=1 on the > >>>>>>> kernel command line. > >>>>>> > >>>>>> I think an implication of this is that systems haven't been > >>>>>> well-tested with interrupts enabled. In general when we've found a > >>>>>> firmware issue in one place it ends up happening elsewhere as well, so > >>>>>> it wouldn't surprise me if there are other machines that will also be > >>>>>> unhappy with interrupts enabled. Would it be possible to automatically > >>>>>> detect this case (eg, if we get more than a certain number of > >>>>>> interrupts in a certain timeframe immediately after enabling the > >>>>>> interrupt) and automatically fall back to polling in that case? It > >>>>>> would also mean that users with fixed firmware wouldn't need to pass a > >>>>>> parameter. > >>>>> > >>>>> I believe Matthew is correct here. I found another system today > >>>>> with completely different vendor for both the system and the tpm chip. > >>>>> In addition another Lenovo model, the L490, has the issue. > >>>>> > >>>>> This initial attempt at a solution like Matthew suggested works on > >>>>> the system I found today, but I imagine it is all sorts of wrong. > >>>>> In the 2 systems where I've seen it, there are about 100000 interrupts > >>>>> in around 1.5 seconds, and then the irq code shuts down the interrupt > >>>>> because they aren't being handled. > >>>>> > >>>>> > >>>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c > >>>>> index 49ae09ac604f..478e9d02a3fa 100644 > >>>>> --- a/drivers/char/tpm/tpm_tis_core.c > >>>>> +++ b/drivers/char/tpm/tpm_tis_core.c > >>>>> @@ -27,6 +27,11 @@ > >>>>> #include "tpm.h" > >>>>> #include "tpm_tis_core.h" > >>>>> > >>>>> +static unsigned int time_start = 0; > >>>>> +static bool storm_check = true; > >>>>> +static bool storm_killed = false; > >>>>> +static u32 irqs_fired = 0; > >>>> > >>>> Maybe kstat_irqs() would be a better idea than ad hoc stats. > >>>> > >>> > >>> Thanks, yes that would be better. > >>> > >>>>> + > >>>>> static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value); > >>>>> > >>>>> static void tpm_tis_enable_interrupt(struct tpm_chip *chip, u8 mask) > >>>>> @@ -464,25 +469,31 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > >>>>> return rc; > >>>>> } > >>>>> > >>>>> -static void disable_interrupts(struct tpm_chip *chip) > >>>>> +static void __disable_interrupts(struct tpm_chip *chip) > >>>>> { > >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>>>> u32 intmask; > >>>>> int rc; > >>>>> > >>>>> - if (priv->irq == 0) > >>>>> - return; > >>>>> - > >>>>> rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask); > >>>>> if (rc < 0) > >>>>> intmask = 0; > >>>>> > >>>>> intmask &= ~TPM_GLOBAL_INT_ENABLE; > >>>>> rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask); > >>>>> + chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >>>>> +} > >>>>> + > >>>>> +static void disable_interrupts(struct tpm_chip *chip) > >>>>> +{ > >>>>> + struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>>>> > >>>>> + if (priv->irq == 0) > >>>>> + return; > >>>>> + > >>>>> + __disable_interrupts(chip); > >>>>> devm_free_irq(chip->dev.parent, priv->irq, chip); > >>>>> priv->irq = 0; > >>>>> - chip->flags &= ~TPM_CHIP_FLAG_IRQ; > >>>>> } > >>>>> > >>>>> /* > >>>>> @@ -528,6 +539,12 @@ static int tpm_tis_send(struct tpm_chip *chip, u8 *buf, size_t len) > >>>>> int rc, irq; > >>>>> struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > >>>>> > >>>>> + if (unlikely(storm_killed)) { > >>>>> + devm_free_irq(chip->dev.parent, priv->irq, chip); > >>>>> + priv->irq = 0; > >>>>> + storm_killed = false; > >>>>> + } > >>>> > >>>> OK this kind of bad solution because if tpm_tis_send() is not called, > >>>> then IRQ is never freed. AFAIK, devres_* do not sleep but use spin > >>>> lock, i.e. you could render out both storm_check and storm_killed. > >>>> > >>> > >>> Is there a way to flag it for freeing later while in an interrupt > >>> context? I'm not sure where to clean it up since devm_free_irq can't be > >>> called in tis_int_handler. > >> > >> You could add a workqueue work-struct just for this and queue that up > >> to do the free when you detect the storm. That will then run pretty much > >> immediately, avoiding the storm going on for (much) longer. > > > > That's sounds feasible. > > > >>> Before diving further into that though, does anyone else have an opinion > >>> on ripping out the irq code, and just using polling? We've been only > >>> polling since 2015 anyways. > >> > >> Given James Bottomley's reply I guess it would be worthwhile to get the > >> storm detection to work. > > > > OK, agreed. I take my words back from a response few minutes ago :-) > > :) > > To be clear, I think we should give the storm detection a go. Especially > given the problems which James has seen with polling on some TPMs. > > But if that turns out to not be feasible I agree we should just either > disable IRQs by default on standard x86 platforms, or just remove the > IRQ support all together. Just for completeness: one option is also to whitelist IRQ's. > Regards, > > Hans /Jarkko
diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c index 0b214963539d..4ed6e660273a 100644 --- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -27,6 +27,7 @@ #include <linux/of.h> #include <linux/of_device.h> #include <linux/kernel.h> +#include <linux/dmi.h> #include "tpm.h" #include "tpm_tis_core.h" @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da return container_of(data, struct tpm_tis_tcg_phy, priv); } -static bool interrupts = true; -module_param(interrupts, bool, 0444); +static int interrupts = -1; +module_param(interrupts, int, 0444); MODULE_PARM_DESC(interrupts, "Enable interrupts"); static bool itpm; @@ -63,6 +64,28 @@ module_param(force, bool, 0444); MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry"); #endif +static int tpm_tis_disable_irq(const struct dmi_system_id *d) +{ + if (interrupts == -1) { + pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident); + interrupts = 0; + } + + return 0; +} + +static const struct dmi_system_id tpm_tis_dmi_table[] = { + { + .callback = tpm_tis_disable_irq, + .ident = "ThinkPad T490s", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"), + }, + }, + {} +}; + #if defined(CONFIG_PNP) && defined(CONFIG_ACPI) static int has_hid(struct acpi_device *dev, const char *hid) { @@ -192,6 +215,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info) int irq = -1; int rc; + dmi_check_system(tpm_tis_dmi_table); + rc = check_acpi_tpm2(dev); if (rc) return rc;