mbox series

[v2,0/5] tpm_tis: fix interrupts (again)

Message ID 20201001180925.13808-1-James.Bottomley@HansenPartnership.com (mailing list archive)
Headers show
Series tpm_tis: fix interrupts (again) | expand

Message

James Bottomley Oct. 1, 2020, 6:09 p.m. UTC
The current state of the TIS TPM is that interrupts have been globally
disabled by various changes.  The problems we got reported the last
time they were enabled was interrupt storms.  With my own TIS TPM,
I've found that this is caused because my TPM doesn't do legacy
cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
requires any TIS TPM without legacy cycles not to act on any write to
an interrupt register unless the locality is enabled.  This means if
an interrupt fires after we relinquish the locality, the TPM_EOI in
the interrupt routine is ineffective meaning the same interrupt
triggers over and over again.  This problem also means we can have
trouble setting up interrupts on TIS TPMs because the current init
code does the setup before the locality is claimed for the first time.

James

---

James Bottomley (5):
  tpm_tis: Fix check_locality for correct locality acquisition
  tpm_tis: Clean up locality release
  tpm_tis: Fix interrupts for TIS TPMs without legacy cycles
  tpm_tis: fix IRQ probing
  Revert "tpm: Revert "tpm_tis_core: Turn on the TPM before probing
    IRQ's""

 drivers/char/tpm/tpm_tis_core.c | 185 ++++++++++++++++++++------------
 1 file changed, 117 insertions(+), 68 deletions(-)

Comments

Jerry Snitselaar Oct. 12, 2020, 5:39 a.m. UTC | #1
James Bottomley @ 2020-10-01 11:09 MST:

> The current state of the TIS TPM is that interrupts have been globally
> disabled by various changes.  The problems we got reported the last
> time they were enabled was interrupt storms.  With my own TIS TPM,
> I've found that this is caused because my TPM doesn't do legacy
> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> requires any TIS TPM without legacy cycles not to act on any write to
> an interrupt register unless the locality is enabled.  This means if
> an interrupt fires after we relinquish the locality, the TPM_EOI in
> the interrupt routine is ineffective meaning the same interrupt
> triggers over and over again.  This problem also means we can have
> trouble setting up interrupts on TIS TPMs because the current init
> code does the setup before the locality is claimed for the first time.
>
> James
>


I tested initially with the following commits reverted:

aa4a63dd9816 tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" | 2020-01-06 | (Stefan Berger)
dda8b2af395b tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" | 2020-01-06 | (Stefan Berger)

The laptop doesn't become unusable, but I lose the trackpad and get the following:

[    3.070501] irq 31: nobody cared (try booting with the "irqpoll" option)
[    3.070504] CPU: 2 PID: 251 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
[    3.070504] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
[    3.070505] Call Trace:
[    3.070511]  dump_stack+0x6b/0x88
[    3.070514]  __report_bad_irq+0x35/0xa7
[    3.070516]  note_interrupt.cold+0xb/0x6a
[    3.070518]  handle_irq_event+0x88/0x8a
[    3.070519]  handle_fasteoi_irq+0x78/0x1c0
[    3.070522]  common_interrupt+0x68/0x140
[    3.070524]  ? asm_common_interrupt+0x8/0x40
[    3.070525]  asm_common_interrupt+0x1e/0x40
[    3.070527] RIP: 0033:0x7f2eea3249b5
[    3.070529] Code: e0 48 c1 e8 3c 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 37 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 <48> 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8
[    3.070529] RSP: 002b:00007f2ee3ffebc0 EFLAGS: 00000202
[    3.070530] RAX: 0000000000000001 RBX: 000000000000000a RCX: 000000000000002e
[    3.070531] RDX: 0463400000000000 RSI: 0000000000000000 RDI: 0000000000000001
[    3.070532] RBP: 00007f2ee3ffec10 R08: 0000000000000000 R09: 0000000000000035
[    3.070532] R10: 00007ffde716f080 R11: 00007ffde716f080 R12: 00007f2edc004c00
[    3.070533] R13: 00007ffde700a54f R14: 00007f2ee3ffed10 R15: 00005598a41e3680
[    3.070534] handlers:
[    3.070537] [<000000007b6f3232>] tis_int_handler
[    3.070538] Disabling IRQ #31
...
[   14.956342] irq 16: nobody cared (try booting with the "irqpoll" option)
[   14.956344] CPU: 0 PID: 1013 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
[   14.956344] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
[   14.956345] Call Trace:
[   14.956350]  dump_stack+0x6b/0x88
[   14.956353]  __report_bad_irq+0x35/0xa7
[   14.956354]  note_interrupt.cold+0xb/0x6a
[   14.956355]  handle_irq_event+0x88/0x8a
[   14.956356]  handle_fasteoi_irq+0x78/0x1c0
[   14.956358]  common_interrupt+0x68/0x140
[   14.956360]  ? asm_common_interrupt+0x8/0x40
[   14.956361]  asm_common_interrupt+0x1e/0x40
[   14.956362] RIP: 0033:0x7fcaff4399d3
[   14.956363] Code: e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 16 83 e0 01 48 31 45 f0 <48> d1 65 e0 48 8b 45 f0 48 31 45 e0 83 45 d4 01 83 7d d4 40 0f 86
[   14.956364] RSP: 002b:00007fcafe544bc0 EFLAGS: 00000246
[   14.956364] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000026
[   14.956365] RDX: 000df20000000000 RSI: 0000000000000000 RDI: 0000000000000001
[   14.956365] RBP: 00007fcafe544c10 R08: 0000000000000000 R09: 0000000000000035
[   14.956366] R10: 00007ffc30bd2080 R11: 00007ffc30bd2080 R12: 00007fcaf0004c00
[   14.956366] R13: 00007fcaf0004c00 R14: 00007fcaf0000b60 R15: 0000560216fcf0f2
[   14.956367] handlers:
[   14.956370] [<0000000024c0571e>] i801_isr [i2c_i801]
[   14.956371] Disabling IRQ #16

/proc/interrupts shows:

  16:     100000          0          0          0          0          0          0          0  IR-IO-APIC   16-fasteoi   i801_smbus
  31:          0          0     100000          0          0          0          0          0  IR-IO-APIC   31-fasteoi   tpm0


I also get this behavior with your patchset applied. If I boot with
tpm_tis.interrupts=0 it behaves.  The thought from Hans is to look at
the dmi info and key off the vendor being Lenovo and bios date to
decide if interrupts should be disabled. Since Dan ran also into this
with something internal at Intel I don't know if that will be
sufficient.

I hope to look over this patchset tomorrow.

Regards,
Jerry
Jarkko Sakkinen Oct. 13, 2020, 1:17 a.m. UTC | #2
On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
> The current state of the TIS TPM is that interrupts have been globally
> disabled by various changes.  The problems we got reported the last
> time they were enabled was interrupt storms.  With my own TIS TPM,
> I've found that this is caused because my TPM doesn't do legacy
> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> requires any TIS TPM without legacy cycles not to act on any write to
> an interrupt register unless the locality is enabled.  This means if
> an interrupt fires after we relinquish the locality, the TPM_EOI in
> the interrupt routine is ineffective meaning the same interrupt
> triggers over and over again.  This problem also means we can have
> trouble setting up interrupts on TIS TPMs because the current init
> code does the setup before the locality is claimed for the first time.
> 
> James

You should consider expanding the audience. Jerry, once you have some
bandwidth (no rush, does not land before rc2), it would be great that if
you could try this. I'm emphasizing this just because of the
intersection. I think it would also make senset to get tested-by from
Nayna.

Speaking of the changelog, I almost never have encounter a patch set
that does not have a changelog in the cover letter. And I'm not able to
interpret the process text in a way that it would ask to scatter
changelogs to the patches, and not put them to the cover letter [*].

Thus, I trust what I see as commodity. Not only that but it also helps a
lot with to see quickly what has changed, especially if the patches are
such that you have to take them in eventually.

In my own patch sets, I've recently started to xref associated lore korg
discussions in the change log entries, when it applies. I started to do
this with the SGX series when I had missed to address a number of Boris'
comments. It has helped a lot with my own tracking and I assume it is
also helpful for reviewers and maintainers to get the context, when they
need it.

The last paragraph is not something I demand. I'm merely just mentioning
it because I think it is a good practice for any patch set.

[*] https://lore.kernel.org/linux-integrity/14edea1f5092c2b8442165756b2ee32e56bed1eb.camel@HansenPartnership.com/

/Jarkko
Jarkko Sakkinen Oct. 13, 2020, 1:23 a.m. UTC | #3
On Sun, Oct 11, 2020 at 10:39:07PM -0700, Jerry Snitselaar wrote:
> 
> James Bottomley @ 2020-10-01 11:09 MST:
> 
> > The current state of the TIS TPM is that interrupts have been globally
> > disabled by various changes.  The problems we got reported the last
> > time they were enabled was interrupt storms.  With my own TIS TPM,
> > I've found that this is caused because my TPM doesn't do legacy
> > cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> > requires any TIS TPM without legacy cycles not to act on any write to
> > an interrupt register unless the locality is enabled.  This means if
> > an interrupt fires after we relinquish the locality, the TPM_EOI in
> > the interrupt routine is ineffective meaning the same interrupt
> > triggers over and over again.  This problem also means we can have
> > trouble setting up interrupts on TIS TPMs because the current init
> > code does the setup before the locality is claimed for the first time.
> >
> > James
> >
> 
> 
> I tested initially with the following commits reverted:
> 
> aa4a63dd9816 tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" | 2020-01-06 | (Stefan Berger)
> dda8b2af395b tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" | 2020-01-06 | (Stefan Berger)
> 
> The laptop doesn't become unusable, but I lose the trackpad and get the following:
> 
> [    3.070501] irq 31: nobody cared (try booting with the "irqpoll" option)
> [    3.070504] CPU: 2 PID: 251 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
> [    3.070504] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
> [    3.070505] Call Trace:
> [    3.070511]  dump_stack+0x6b/0x88
> [    3.070514]  __report_bad_irq+0x35/0xa7
> [    3.070516]  note_interrupt.cold+0xb/0x6a
> [    3.070518]  handle_irq_event+0x88/0x8a
> [    3.070519]  handle_fasteoi_irq+0x78/0x1c0
> [    3.070522]  common_interrupt+0x68/0x140
> [    3.070524]  ? asm_common_interrupt+0x8/0x40
> [    3.070525]  asm_common_interrupt+0x1e/0x40
> [    3.070527] RIP: 0033:0x7f2eea3249b5
> [    3.070529] Code: e0 48 c1 e8 3c 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 37 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 <48> 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8
> [    3.070529] RSP: 002b:00007f2ee3ffebc0 EFLAGS: 00000202
> [    3.070530] RAX: 0000000000000001 RBX: 000000000000000a RCX: 000000000000002e
> [    3.070531] RDX: 0463400000000000 RSI: 0000000000000000 RDI: 0000000000000001
> [    3.070532] RBP: 00007f2ee3ffec10 R08: 0000000000000000 R09: 0000000000000035
> [    3.070532] R10: 00007ffde716f080 R11: 00007ffde716f080 R12: 00007f2edc004c00
> [    3.070533] R13: 00007ffde700a54f R14: 00007f2ee3ffed10 R15: 00005598a41e3680
> [    3.070534] handlers:
> [    3.070537] [<000000007b6f3232>] tis_int_handler
> [    3.070538] Disabling IRQ #31
> ...
> [   14.956342] irq 16: nobody cared (try booting with the "irqpoll" option)
> [   14.956344] CPU: 0 PID: 1013 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
> [   14.956344] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
> [   14.956345] Call Trace:
> [   14.956350]  dump_stack+0x6b/0x88
> [   14.956353]  __report_bad_irq+0x35/0xa7
> [   14.956354]  note_interrupt.cold+0xb/0x6a
> [   14.956355]  handle_irq_event+0x88/0x8a
> [   14.956356]  handle_fasteoi_irq+0x78/0x1c0
> [   14.956358]  common_interrupt+0x68/0x140
> [   14.956360]  ? asm_common_interrupt+0x8/0x40
> [   14.956361]  asm_common_interrupt+0x1e/0x40
> [   14.956362] RIP: 0033:0x7fcaff4399d3
> [   14.956363] Code: e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 16 83 e0 01 48 31 45 f0 <48> d1 65 e0 48 8b 45 f0 48 31 45 e0 83 45 d4 01 83 7d d4 40 0f 86
> [   14.956364] RSP: 002b:00007fcafe544bc0 EFLAGS: 00000246
> [   14.956364] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000026
> [   14.956365] RDX: 000df20000000000 RSI: 0000000000000000 RDI: 0000000000000001
> [   14.956365] RBP: 00007fcafe544c10 R08: 0000000000000000 R09: 0000000000000035
> [   14.956366] R10: 00007ffc30bd2080 R11: 00007ffc30bd2080 R12: 00007fcaf0004c00
> [   14.956366] R13: 00007fcaf0004c00 R14: 00007fcaf0000b60 R15: 0000560216fcf0f2
> [   14.956367] handlers:
> [   14.956370] [<0000000024c0571e>] i801_isr [i2c_i801]
> [   14.956371] Disabling IRQ #16
> 
> /proc/interrupts shows:
> 
>   16:     100000          0          0          0          0          0          0          0  IR-IO-APIC   16-fasteoi   i801_smbus
>   31:          0          0     100000          0          0          0          0          0  IR-IO-APIC   31-fasteoi   tpm0
> 
> 
> I also get this behavior with your patchset applied. If I boot with
> tpm_tis.interrupts=0 it behaves.  The thought from Hans is to look at
> the dmi info and key off the vendor being Lenovo and bios date to
> decide if interrupts should be disabled. Since Dan ran also into this
> with something internal at Intel I don't know if that will be
> sufficient.
> 
> I hope to look over this patchset tomorrow.
> 
> Regards,
> Jerry

I'm sorry, I received this email only after pulling the latest with fdm.

/Jarkko
Jerry Snitselaar Oct. 13, 2020, 3:15 p.m. UTC | #4
Jarkko Sakkinen @ 2020-10-12 18:17 MST:

> On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
>> The current state of the TIS TPM is that interrupts have been globally
>> disabled by various changes.  The problems we got reported the last
>> time they were enabled was interrupt storms.  With my own TIS TPM,
>> I've found that this is caused because my TPM doesn't do legacy
>> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
>> requires any TIS TPM without legacy cycles not to act on any write to
>> an interrupt register unless the locality is enabled.  This means if
>> an interrupt fires after we relinquish the locality, the TPM_EOI in
>> the interrupt routine is ineffective meaning the same interrupt
>> triggers over and over again.  This problem also means we can have
>> trouble setting up interrupts on TIS TPMs because the current init
>> code does the setup before the locality is claimed for the first time.
>> 
>> James
>
> You should consider expanding the audience. Jerry, once you have some
> bandwidth (no rush, does not land before rc2), it would be great that if
> you could try this. I'm emphasizing this just because of the
> intersection. I think it would also make senset to get tested-by from
> Nayna.

I will run some tests on some other systems I have access to. As noted
in the other email I did a quick test with a t490s with an older bios
that exhibits the problem originally reported when Stefan's patch
enabled interrupts.

>
> Speaking of the changelog, I almost never have encounter a patch set
> that does not have a changelog in the cover letter. And I'm not able to
> interpret the process text in a way that it would ask to scatter
> changelogs to the patches, and not put them to the cover letter [*].
>
> Thus, I trust what I see as commodity. Not only that but it also helps a
> lot with to see quickly what has changed, especially if the patches are
> such that you have to take them in eventually.
>
> In my own patch sets, I've recently started to xref associated lore korg
> discussions in the change log entries, when it applies. I started to do
> this with the SGX series when I had missed to address a number of Boris'
> comments. It has helped a lot with my own tracking and I assume it is
> also helpful for reviewers and maintainers to get the context, when they
> need it.
>
> The last paragraph is not something I demand. I'm merely just mentioning
> it because I think it is a good practice for any patch set.
>
> [*] https://lore.kernel.org/linux-integrity/14edea1f5092c2b8442165756b2ee32e56bed1eb.camel@HansenPartnership.com/
>
> /Jarkko
James Bottomley Oct. 13, 2020, 3:24 p.m. UTC | #5
On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> 
> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
> > > The current state of the TIS TPM is that interrupts have been
> > > globally disabled by various changes.  The problems we got
> > > reported the last time they were enabled was interrupt
> > > storms.  With my own TIS TPM, I've found that this is caused
> > > because my TPM doesn't do legacy cycles, The TIS spec (chapter
> > > 6.1 "Locality Usage Per Register") requires any TIS TPM without
> > > legacy cycles not to act on any write to an interrupt register
> > > unless the locality is enabled.  This means if an interrupt fires
> > > after we relinquish the locality, the TPM_EOI in the interrupt
> > > routine is ineffective meaning the same interrupt triggers over
> > > and over again.  This problem also means we can have trouble
> > > setting up interrupts on TIS TPMs because the current init
> > > code does the setup before the locality is claimed for the first
> > > time.
> > > 
> > > James
> > 
> > You should consider expanding the audience.

Well, most people interested in testing this sort of thing are already
on the integrity list.

> >  Jerry, once you have some bandwidth (no rush, does not land before
> > rc2), it would be great that if you could try this. I'm emphasizing
> > this just because of the intersection. I think it would also make
> > senset to get tested-by from Nayna.
> 
> I will run some tests on some other systems I have access to. As
> noted in the other email I did a quick test with a t490s with an
> older bios that exhibits the problem originally reported when
> Stefan's patch enabled interrupts.

Well, it means there's still some other problem.  I was hoping that
because the rainbow pass system originally exhibited the same symptoms
(interrupt storm) fixing it would also fix the t490 and the ineffective
EOI bug looked like a great candidate for being the root cause.

How amenable are you to debugging this?  I originally figured out the
problem with the rainbow pass by putting ratelimited printks in the
interrupt routine and most of the TIS transmission ones, but it's
somewhat labour intensive doing it this way.

James
Jerry Snitselaar Oct. 13, 2020, 4:05 p.m. UTC | #6
James Bottomley @ 2020-10-13 08:24 MST:

> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> 
>> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
>> > > The current state of the TIS TPM is that interrupts have been
>> > > globally disabled by various changes.  The problems we got
>> > > reported the last time they were enabled was interrupt
>> > > storms.  With my own TIS TPM, I've found that this is caused
>> > > because my TPM doesn't do legacy cycles, The TIS spec (chapter
>> > > 6.1 "Locality Usage Per Register") requires any TIS TPM without
>> > > legacy cycles not to act on any write to an interrupt register
>> > > unless the locality is enabled.  This means if an interrupt fires
>> > > after we relinquish the locality, the TPM_EOI in the interrupt
>> > > routine is ineffective meaning the same interrupt triggers over
>> > > and over again.  This problem also means we can have trouble
>> > > setting up interrupts on TIS TPMs because the current init
>> > > code does the setup before the locality is claimed for the first
>> > > time.
>> > > 
>> > > James
>> > 
>> > You should consider expanding the audience.
>
> Well, most people interested in testing this sort of thing are already
> on the integrity list.
>
>> >  Jerry, once you have some bandwidth (no rush, does not land before
>> > rc2), it would be great that if you could try this. I'm emphasizing
>> > this just because of the intersection. I think it would also make
>> > senset to get tested-by from Nayna.
>> 
>> I will run some tests on some other systems I have access to. As
>> noted in the other email I did a quick test with a t490s with an
>> older bios that exhibits the problem originally reported when
>> Stefan's patch enabled interrupts.
>
> Well, it means there's still some other problem.  I was hoping that
> because the rainbow pass system originally exhibited the same symptoms
> (interrupt storm) fixing it would also fix the t490 and the ineffective
> EOI bug looked like a great candidate for being the root cause.
>

Adding Hans to the list.

IIUC in the t490s case the problem lies with the hardware itself. Hans,
is that correct?

> How amenable are you to debugging this?  I originally figured out the
> problem with the rainbow pass by putting ratelimited printks in the
> interrupt routine and most of the TIS transmission ones, but it's
> somewhat labour intensive doing it this way.
>
> James
Hans de Goede Oct. 14, 2020, 3:03 p.m. UTC | #7
Hi,

On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> 
> James Bottomley @ 2020-10-13 08:24 MST:
> 
>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>>
>>>> On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
>>>>> The current state of the TIS TPM is that interrupts have been
>>>>> globally disabled by various changes.  The problems we got
>>>>> reported the last time they were enabled was interrupt
>>>>> storms.  With my own TIS TPM, I've found that this is caused
>>>>> because my TPM doesn't do legacy cycles, The TIS spec (chapter
>>>>> 6.1 "Locality Usage Per Register") requires any TIS TPM without
>>>>> legacy cycles not to act on any write to an interrupt register
>>>>> unless the locality is enabled.  This means if an interrupt fires
>>>>> after we relinquish the locality, the TPM_EOI in the interrupt
>>>>> routine is ineffective meaning the same interrupt triggers over
>>>>> and over again.  This problem also means we can have trouble
>>>>> setting up interrupts on TIS TPMs because the current init
>>>>> code does the setup before the locality is claimed for the first
>>>>> time.
>>>>>
>>>>> James
>>>>
>>>> You should consider expanding the audience.
>>
>> Well, most people interested in testing this sort of thing are already
>> on the integrity list.
>>
>>>>   Jerry, once you have some bandwidth (no rush, does not land before
>>>> rc2), it would be great that if you could try this. I'm emphasizing
>>>> this just because of the intersection. I think it would also make
>>>> senset to get tested-by from Nayna.
>>>
>>> I will run some tests on some other systems I have access to. As
>>> noted in the other email I did a quick test with a t490s with an
>>> older bios that exhibits the problem originally reported when
>>> Stefan's patch enabled interrupts.
>>
>> Well, it means there's still some other problem.  I was hoping that
>> because the rainbow pass system originally exhibited the same symptoms
>> (interrupt storm) fixing it would also fix the t490 and the ineffective
>> EOI bug looked like a great candidate for being the root cause.
>>
> 
> Adding Hans to the list.
> 
> IIUC in the t490s case the problem lies with the hardware itself. Hans,
> is that correct?

More or less. AFAIK / have been told by Lenovo it is an issue with the
configuration of the inerrupt-type of the GPIO pin used for the IRQ,
which is a firmware issue which could be fixed by a BIOS update
(the pin is setup as a direct-irq pin for the APIC, so the OS has no
control of the IRQ type since with APIC irqs this is all supposed to
be setup properly before hand).

But it is a model specific issue, if we denylist IRQ usage on this
Lenovo model (and probably a few others) then we should be able to
restore the IRQ code to normal functionality for all other device
models which declare an IRQ in their resource tables.

Regards,

Hans
James Bottomley Oct. 14, 2020, 3:23 p.m. UTC | #8
On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> > James Bottomley @ 2020-10-13 08:24 MST:
> > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
> > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
[...]
> > > > >   Jerry, once you have some bandwidth (no rush, does not land
> > > > > before rc2), it would be great that if you could try this.
> > > > > I'm emphasizing this just because of the intersection. I
> > > > > think it would also make senset to get tested-by from Nayna.
> > > > 
> > > > I will run some tests on some other systems I have access to.
> > > > As noted in the other email I did a quick test with a t490s
> > > > with an older bios that exhibits the problem originally
> > > > reported when Stefan's patch enabled interrupts.
> > > 
> > > Well, it means there's still some other problem.  I was hoping
> > > that because the rainbow pass system originally exhibited the
> > > same symptoms (interrupt storm) fixing it would also fix the t490
> > > and the ineffective EOI bug looked like a great candidate for
> > > being the root cause.
> > > 
> > 
> > Adding Hans to the list.
> > 
> > IIUC in the t490s case the problem lies with the hardware itself.
> > Hans, is that correct?
> 
> More or less. AFAIK / have been told by Lenovo it is an issue with
> the configuration of the inerrupt-type of the GPIO pin used for the
> IRQ, which is a firmware issue which could be fixed by a BIOS update
> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
> control of the IRQ type since with APIC irqs this is all supposed to
> be setup properly before hand).
> 
> But it is a model specific issue, if we denylist IRQ usage on this
> Lenovo model (and probably a few others) then we should be able to
> restore the IRQ code to normal functionality for all other device
> models which declare an IRQ in their resource tables.

I can do that with a quirk, but how do I identify the device?  TPM
manufacturer and version? or do I have to use something like the ACPI
bios version?

James
Hans de Goede Oct. 14, 2020, 4:04 p.m. UTC | #9
Hi,

On 10/14/20 5:23 PM, James Bottomley wrote:
> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> [...]
>>>>>>    Jerry, once you have some bandwidth (no rush, does not land
>>>>>> before rc2), it would be great that if you could try this.
>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>
>>>>> I will run some tests on some other systems I have access to.
>>>>> As noted in the other email I did a quick test with a t490s
>>>>> with an older bios that exhibits the problem originally
>>>>> reported when Stefan's patch enabled interrupts.
>>>>
>>>> Well, it means there's still some other problem.  I was hoping
>>>> that because the rainbow pass system originally exhibited the
>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>> and the ineffective EOI bug looked like a great candidate for
>>>> being the root cause.
>>>>
>>>
>>> Adding Hans to the list.
>>>
>>> IIUC in the t490s case the problem lies with the hardware itself.
>>> Hans, is that correct?
>>
>> More or less. AFAIK / have been told by Lenovo it is an issue with
>> the configuration of the inerrupt-type of the GPIO pin used for the
>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>> control of the IRQ type since with APIC irqs this is all supposed to
>> be setup properly before hand).
>>
>> But it is a model specific issue, if we denylist IRQ usage on this
>> Lenovo model (and probably a few others) then we should be able to
>> restore the IRQ code to normal functionality for all other device
>> models which declare an IRQ in their resource tables.
> 
> I can do that with a quirk, but how do I identify the device?  TPM
> manufacturer and version? or do I have to use something like the ACPI
> bios version?

I'm not sure if the TPM ids are unique to one model/series of laptops.

So my idea for this was to match on DMI strings, specifically
use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
devices the string which you expect to be in DMI_PRODUCT_NAME
is actually in DMI_PRODUCT_VERSION).

You can easily get the strings for your device by doing:

cat /sys/class/dmi/id/sys_vendor
cat /sys/class/dmi/id/product_version

Regards,

Hans
Jerry Snitselaar Oct. 14, 2020, 4:34 p.m. UTC | #10
Hans de Goede @ 2020-10-14 09:04 MST:

> Hi,
>
> On 10/14/20 5:23 PM, James Bottomley wrote:
>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> [...]
>>>>>>>    Jerry, once you have some bandwidth (no rush, does not land
>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>
>>>>>> I will run some tests on some other systems I have access to.
>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>> with an older bios that exhibits the problem originally
>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>
>>>>> Well, it means there's still some other problem.  I was hoping
>>>>> that because the rainbow pass system originally exhibited the
>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>> being the root cause.
>>>>>
>>>>
>>>> Adding Hans to the list.
>>>>
>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>> Hans, is that correct?
>>>
>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>> control of the IRQ type since with APIC irqs this is all supposed to
>>> be setup properly before hand).
>>>
>>> But it is a model specific issue, if we denylist IRQ usage on this
>>> Lenovo model (and probably a few others) then we should be able to
>>> restore the IRQ code to normal functionality for all other device
>>> models which declare an IRQ in their resource tables.
>> I can do that with a quirk, but how do I identify the device?  TPM
>> manufacturer and version? or do I have to use something like the ACPI
>> bios version?
>
> I'm not sure if the TPM ids are unique to one model/series of laptops.
>
> So my idea for this was to match on DMI strings, specifically
> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
> devices the string which you expect to be in DMI_PRODUCT_NAME
> is actually in DMI_PRODUCT_VERSION).
>
> You can easily get the strings for your device by doing:
>
> cat /sys/class/dmi/id/sys_vendor
> cat /sys/class/dmi/id/product_version
>
> Regards,
>
> Hans

Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
if the bios is older than the fixed bios? Has Lenovo
released the fixed bios?
Hans de Goede Oct. 14, 2020, 4:46 p.m. UTC | #11
Hi,

On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
> 
> Hans de Goede @ 2020-10-14 09:04 MST:
> 
>> Hi,
>>
>> On 10/14/20 5:23 PM, James Bottomley wrote:
>>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>> [...]
>>>>>>>>     Jerry, once you have some bandwidth (no rush, does not land
>>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>>
>>>>>>> I will run some tests on some other systems I have access to.
>>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>>> with an older bios that exhibits the problem originally
>>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>>
>>>>>> Well, it means there's still some other problem.  I was hoping
>>>>>> that because the rainbow pass system originally exhibited the
>>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>>> being the root cause.
>>>>>>
>>>>>
>>>>> Adding Hans to the list.
>>>>>
>>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>>> Hans, is that correct?
>>>>
>>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>>> control of the IRQ type since with APIC irqs this is all supposed to
>>>> be setup properly before hand).
>>>>
>>>> But it is a model specific issue, if we denylist IRQ usage on this
>>>> Lenovo model (and probably a few others) then we should be able to
>>>> restore the IRQ code to normal functionality for all other device
>>>> models which declare an IRQ in their resource tables.
>>> I can do that with a quirk, but how do I identify the device?  TPM
>>> manufacturer and version? or do I have to use something like the ACPI
>>> bios version?
>>
>> I'm not sure if the TPM ids are unique to one model/series of laptops.
>>
>> So my idea for this was to match on DMI strings, specifically
>> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>> devices the string which you expect to be in DMI_PRODUCT_NAME
>> is actually in DMI_PRODUCT_VERSION).
>>
>> You can easily get the strings for your device by doing:
>>
>> cat /sys/class/dmi/id/sys_vendor
>> cat /sys/class/dmi/id/product_version
>>
>> Regards,
>>
>> Hans
> 
> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
> if the bios is older than the fixed bios? Has Lenovo
> released the fixed bios?

Maybe, the fixed BIOS-es which I have seen (for the X1C8,
broken BIOS was a pre-production BIOS) "fixed" this by
no longer listing an IRQ in the ACPI resources for the TPM.

Which means that the new BIOS still being on the deny list
does not matter since the IRQ support won't work anyways as
we no longer get an IRQ assigned.

So I don't think this is necessary and it will just complicate
things unnecessarily. This whole saga has already taken way
too long to fix. So IMHO the simplest fix where we just deny
list the broken models independent of BIOS versions and move
on seems best.

Regards,

Hans
James Bottomley Oct. 14, 2020, 4:49 p.m. UTC | #12
On Wed, 2020-10-14 at 09:34 -0700, Jerry Snitselaar wrote:
> Hans de Goede @ 2020-10-14 09:04 MST:
> 
> > Hi,
> > 
> > On 10/14/20 5:23 PM, James Bottomley wrote:
> > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
> > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> > > > > James Bottomley @ 2020-10-13 08:24 MST:
> > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
> > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> > > [...]
> > > > > > > >    Jerry, once you have some bandwidth (no rush, does
> > > > > > > > not land
> > > > > > > > before rc2), it would be great that if you could try
> > > > > > > > this.
> > > > > > > > I'm emphasizing this just because of the intersection.
> > > > > > > > I
> > > > > > > > think it would also make senset to get tested-by from
> > > > > > > > Nayna.
> > > > > > > 
> > > > > > > I will run some tests on some other systems I have access
> > > > > > > to.
> > > > > > > As noted in the other email I did a quick test with a
> > > > > > > t490s
> > > > > > > with an older bios that exhibits the problem originally
> > > > > > > reported when Stefan's patch enabled interrupts.
> > > > > > 
> > > > > > Well, it means there's still some other problem.  I was
> > > > > > hoping
> > > > > > that because the rainbow pass system originally exhibited
> > > > > > the
> > > > > > same symptoms (interrupt storm) fixing it would also fix
> > > > > > the t490
> > > > > > and the ineffective EOI bug looked like a great candidate
> > > > > > for
> > > > > > being the root cause.
> > > > > > 
> > > > > 
> > > > > Adding Hans to the list.
> > > > > 
> > > > > IIUC in the t490s case the problem lies with the hardware
> > > > > itself.
> > > > > Hans, is that correct?
> > > > 
> > > > More or less. AFAIK / have been told by Lenovo it is an issue
> > > > with
> > > > the configuration of the inerrupt-type of the GPIO pin used for
> > > > the
> > > > IRQ, which is a firmware issue which could be fixed by a BIOS
> > > > update
> > > > (the pin is setup as a direct-irq pin for the APIC, so the OS
> > > > has no
> > > > control of the IRQ type since with APIC irqs this is all
> > > > supposed to
> > > > be setup properly before hand).
> > > > 
> > > > But it is a model specific issue, if we denylist IRQ usage on
> > > > this
> > > > Lenovo model (and probably a few others) then we should be able
> > > > to
> > > > restore the IRQ code to normal functionality for all other
> > > > device
> > > > models which declare an IRQ in their resource tables.
> > > I can do that with a quirk, but how do I identify the
> > > device?  TPM
> > > manufacturer and version? or do I have to use something like the
> > > ACPI
> > > bios version?
> > 
> > I'm not sure if the TPM ids are unique to one model/series of
> > laptops.
> > 
> > So my idea for this was to match on DMI strings, specifically
> > use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
> > strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
> > devices the string which you expect to be in DMI_PRODUCT_NAME
> > is actually in DMI_PRODUCT_VERSION).
> > 
> > You can easily get the strings for your device by doing:
> > 
> > cat /sys/class/dmi/id/sys_vendor
> > cat /sys/class/dmi/id/product_version
> > 
> > Regards,
> > 
> > Hans
> 
> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
> if the bios is older than the fixed bios? Has Lenovo
> released the fixed bios?

Yes, get all the DMI before and after information, but ideally also the
TIS before and after information.  That's what you get from the tpm_tis
line in dmesg:

tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFE, rev-id 2)

What I was hoping is rev-id might change.  However, if the bug is
specific to the Lenovo firmware and not the TPM firmware, that is
unlikely.

James
Jerry Snitselaar Oct. 14, 2020, 5:01 p.m. UTC | #13
Hans de Goede @ 2020-10-14 09:46 MST:

> Hi,
>
> On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>> Hans de Goede @ 2020-10-14 09:04 MST:
>> 
>>> Hi,
>>>
>>> On 10/14/20 5:23 PM, James Bottomley wrote:
>>>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>>> [...]
>>>>>>>>>     Jerry, once you have some bandwidth (no rush, does not land
>>>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>>>
>>>>>>>> I will run some tests on some other systems I have access to.
>>>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>>>> with an older bios that exhibits the problem originally
>>>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>>>
>>>>>>> Well, it means there's still some other problem.  I was hoping
>>>>>>> that because the rainbow pass system originally exhibited the
>>>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>>>> being the root cause.
>>>>>>>
>>>>>>
>>>>>> Adding Hans to the list.
>>>>>>
>>>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>>>> Hans, is that correct?
>>>>>
>>>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>>>> control of the IRQ type since with APIC irqs this is all supposed to
>>>>> be setup properly before hand).
>>>>>
>>>>> But it is a model specific issue, if we denylist IRQ usage on this
>>>>> Lenovo model (and probably a few others) then we should be able to
>>>>> restore the IRQ code to normal functionality for all other device
>>>>> models which declare an IRQ in their resource tables.
>>>> I can do that with a quirk, but how do I identify the device?  TPM
>>>> manufacturer and version? or do I have to use something like the ACPI
>>>> bios version?
>>>
>>> I'm not sure if the TPM ids are unique to one model/series of laptops.
>>>
>>> So my idea for this was to match on DMI strings, specifically
>>> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>>> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>>> devices the string which you expect to be in DMI_PRODUCT_NAME
>>> is actually in DMI_PRODUCT_VERSION).
>>>
>>> You can easily get the strings for your device by doing:
>>>
>>> cat /sys/class/dmi/id/sys_vendor
>>> cat /sys/class/dmi/id/product_version
>>>
>>> Regards,
>>>
>>> Hans
>> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>> if the bios is older than the fixed bios? Has Lenovo
>> released the fixed bios?
>
> Maybe, the fixed BIOS-es which I have seen (for the X1C8,
> broken BIOS was a pre-production BIOS) "fixed" this by
> no longer listing an IRQ in the ACPI resources for the TPM.
>
> Which means that the new BIOS still being on the deny list
> does not matter since the IRQ support won't work anyways as
> we no longer get an IRQ assigned.
>
> So I don't think this is necessary and it will just complicate
> things unnecessarily. This whole saga has already taken way
> too long to fix. So IMHO the simplest fix where we just deny
> list the broken models independent of BIOS versions and move
> on seems best.
>
> Regards,
>
> Hans

Yea, probably just best to disable for the model and be done
with it.

Regards,
Jerry
Jerry Snitselaar Oct. 14, 2020, 5:04 p.m. UTC | #14
Jerry Snitselaar @ 2020-10-14 10:01 MST:

> Hans de Goede @ 2020-10-14 09:46 MST:
>
>> Hi,
>>
>> On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>>> Hans de Goede @ 2020-10-14 09:04 MST:
>>> 
>>>> Hi,
>>>>
>>>> On 10/14/20 5:23 PM, James Bottomley wrote:
>>>>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>>>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>>>> [...]
>>>>>>>>>>     Jerry, once you have some bandwidth (no rush, does not land
>>>>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>>>>
>>>>>>>>> I will run some tests on some other systems I have access to.
>>>>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>>>>> with an older bios that exhibits the problem originally
>>>>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>>>>
>>>>>>>> Well, it means there's still some other problem.  I was hoping
>>>>>>>> that because the rainbow pass system originally exhibited the
>>>>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>>>>> being the root cause.
>>>>>>>>
>>>>>>>
>>>>>>> Adding Hans to the list.
>>>>>>>
>>>>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>>>>> Hans, is that correct?
>>>>>>
>>>>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>>>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>>>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>>>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>>>>> control of the IRQ type since with APIC irqs this is all supposed to
>>>>>> be setup properly before hand).
>>>>>>
>>>>>> But it is a model specific issue, if we denylist IRQ usage on this
>>>>>> Lenovo model (and probably a few others) then we should be able to
>>>>>> restore the IRQ code to normal functionality for all other device
>>>>>> models which declare an IRQ in their resource tables.
>>>>> I can do that with a quirk, but how do I identify the device?  TPM
>>>>> manufacturer and version? or do I have to use something like the ACPI
>>>>> bios version?
>>>>
>>>> I'm not sure if the TPM ids are unique to one model/series of laptops.
>>>>
>>>> So my idea for this was to match on DMI strings, specifically
>>>> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>>>> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>>>> devices the string which you expect to be in DMI_PRODUCT_NAME
>>>> is actually in DMI_PRODUCT_VERSION).
>>>>
>>>> You can easily get the strings for your device by doing:
>>>>
>>>> cat /sys/class/dmi/id/sys_vendor
>>>> cat /sys/class/dmi/id/product_version
>>>>
>>>> Regards,
>>>>
>>>> Hans
>>> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>>> if the bios is older than the fixed bios? Has Lenovo
>>> released the fixed bios?
>>
>> Maybe, the fixed BIOS-es which I have seen (for the X1C8,
>> broken BIOS was a pre-production BIOS) "fixed" this by
>> no longer listing an IRQ in the ACPI resources for the TPM.
>>
>> Which means that the new BIOS still being on the deny list
>> does not matter since the IRQ support won't work anyways as
>> we no longer get an IRQ assigned.
>>
>> So I don't think this is necessary and it will just complicate
>> things unnecessarily. This whole saga has already taken way
>> too long to fix. So IMHO the simplest fix where we just deny
>> list the broken models independent of BIOS versions and move
>> on seems best.
>>
>> Regards,
>>
>> Hans
>
> Yea, probably just best to disable for the model and be done
> with it.
>
> Regards,
> Jerry

# cat /sys/class/dmi/id/sys_vendor
LENOVO
# cat /sys/class/dmi/id/product_version
ThinkPad T490s
Jerry Snitselaar Oct. 14, 2020, 8:58 p.m. UTC | #15
Hans de Goede @ 2020-10-14 09:46 MST:

> Hi,
>
> On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>> Hans de Goede @ 2020-10-14 09:04 MST:
>> 
>>> Hi,
>>>
>>> On 10/14/20 5:23 PM, James Bottomley wrote:
>>>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>>> [...]
>>>>>>>>>     Jerry, once you have some bandwidth (no rush, does not land
>>>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>>>
>>>>>>>> I will run some tests on some other systems I have access to.
>>>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>>>> with an older bios that exhibits the problem originally
>>>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>>>
>>>>>>> Well, it means there's still some other problem.  I was hoping
>>>>>>> that because the rainbow pass system originally exhibited the
>>>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>>>> being the root cause.
>>>>>>>
>>>>>>
>>>>>> Adding Hans to the list.
>>>>>>
>>>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>>>> Hans, is that correct?
>>>>>
>>>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>>>> control of the IRQ type since with APIC irqs this is all supposed to
>>>>> be setup properly before hand).
>>>>>
>>>>> But it is a model specific issue, if we denylist IRQ usage on this
>>>>> Lenovo model (and probably a few others) then we should be able to
>>>>> restore the IRQ code to normal functionality for all other device
>>>>> models which declare an IRQ in their resource tables.
>>>> I can do that with a quirk, but how do I identify the device?  TPM
>>>> manufacturer and version? or do I have to use something like the ACPI
>>>> bios version?
>>>
>>> I'm not sure if the TPM ids are unique to one model/series of laptops.
>>>
>>> So my idea for this was to match on DMI strings, specifically
>>> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>>> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>>> devices the string which you expect to be in DMI_PRODUCT_NAME
>>> is actually in DMI_PRODUCT_VERSION).
>>>
>>> You can easily get the strings for your device by doing:
>>>
>>> cat /sys/class/dmi/id/sys_vendor
>>> cat /sys/class/dmi/id/product_version
>>>
>>> Regards,
>>>
>>> Hans
>> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>> if the bios is older than the fixed bios? Has Lenovo
>> released the fixed bios?
>
> Maybe, the fixed BIOS-es which I have seen (for the X1C8,
> broken BIOS was a pre-production BIOS) "fixed" this by
> no longer listing an IRQ in the ACPI resources for the TPM.
>
> Which means that the new BIOS still being on the deny list
> does not matter since the IRQ support won't work anyways as
> we no longer get an IRQ assigned.
>
> So I don't think this is necessary and it will just complicate
> things unnecessarily. This whole saga has already taken way
> too long to fix. So IMHO the simplest fix where we just deny
> list the broken models independent of BIOS versions and move
> on seems best.
>
> Regards,
>
> Hans

This worked for me:

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 0b214963539d..abe674d1de6d 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -27,6 +27,7 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/kernel.h>
+#include <linux/dmi.h>
 #include "tpm.h"
 #include "tpm_tis_core.h"

@@ -63,6 +64,26 @@ module_param(force, bool, 0444);
 MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
 #endif

+static int tpm_tis_disable_irq(const struct dmi_system_id *d)
+{
+       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
+       interrupts = false;
+
+       return 0;
+}
+
+static const struct dmi_system_id tpm_tis_dmi_table[] = {
+       {
+               .callback = tpm_tis_disable_irq,
+               .ident = "ThinkPad T490s",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
+               },
+       },
+       {}
+};
+
 #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
 static int has_hid(struct acpi_device *dev, const char *hid)
 {
@@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
        int irq = -1;
        int rc;

+       dmi_check_system(tpm_tis_dmi_table);
+
        rc = check_acpi_tpm2(dev);
        if (rc)
                return rc;
Hans de Goede Oct. 15, 2020, 7:38 a.m. UTC | #16
Hi,

On 10/14/20 10:58 PM, Jerry Snitselaar wrote:
> 
> Hans de Goede @ 2020-10-14 09:46 MST:
> 
>> Hi,
>>
>> On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>>> Hans de Goede @ 2020-10-14 09:04 MST:
>>>
>>>> Hi,
>>>>
>>>> On 10/14/20 5:23 PM, James Bottomley wrote:
>>>>> On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>>>>>> On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>>>>>>> James Bottomley @ 2020-10-13 08:24 MST:
>>>>>>>> On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
>>>>>>>>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>>>>> [...]
>>>>>>>>>>      Jerry, once you have some bandwidth (no rush, does not land
>>>>>>>>>> before rc2), it would be great that if you could try this.
>>>>>>>>>> I'm emphasizing this just because of the intersection. I
>>>>>>>>>> think it would also make senset to get tested-by from Nayna.
>>>>>>>>>
>>>>>>>>> I will run some tests on some other systems I have access to.
>>>>>>>>> As noted in the other email I did a quick test with a t490s
>>>>>>>>> with an older bios that exhibits the problem originally
>>>>>>>>> reported when Stefan's patch enabled interrupts.
>>>>>>>>
>>>>>>>> Well, it means there's still some other problem.  I was hoping
>>>>>>>> that because the rainbow pass system originally exhibited the
>>>>>>>> same symptoms (interrupt storm) fixing it would also fix the t490
>>>>>>>> and the ineffective EOI bug looked like a great candidate for
>>>>>>>> being the root cause.
>>>>>>>>
>>>>>>>
>>>>>>> Adding Hans to the list.
>>>>>>>
>>>>>>> IIUC in the t490s case the problem lies with the hardware itself.
>>>>>>> Hans, is that correct?
>>>>>>
>>>>>> More or less. AFAIK / have been told by Lenovo it is an issue with
>>>>>> the configuration of the inerrupt-type of the GPIO pin used for the
>>>>>> IRQ, which is a firmware issue which could be fixed by a BIOS update
>>>>>> (the pin is setup as a direct-irq pin for the APIC, so the OS has no
>>>>>> control of the IRQ type since with APIC irqs this is all supposed to
>>>>>> be setup properly before hand).
>>>>>>
>>>>>> But it is a model specific issue, if we denylist IRQ usage on this
>>>>>> Lenovo model (and probably a few others) then we should be able to
>>>>>> restore the IRQ code to normal functionality for all other device
>>>>>> models which declare an IRQ in their resource tables.
>>>>> I can do that with a quirk, but how do I identify the device?  TPM
>>>>> manufacturer and version? or do I have to use something like the ACPI
>>>>> bios version?
>>>>
>>>> I'm not sure if the TPM ids are unique to one model/series of laptops.
>>>>
>>>> So my idea for this was to match on DMI strings, specifically
>>>> use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>>>> strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>>>> devices the string which you expect to be in DMI_PRODUCT_NAME
>>>> is actually in DMI_PRODUCT_VERSION).
>>>>
>>>> You can easily get the strings for your device by doing:
>>>>
>>>> cat /sys/class/dmi/id/sys_vendor
>>>> cat /sys/class/dmi/id/product_version
>>>>
>>>> Regards,
>>>>
>>>> Hans
>>> Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>>> if the bios is older than the fixed bios? Has Lenovo
>>> released the fixed bios?
>>
>> Maybe, the fixed BIOS-es which I have seen (for the X1C8,
>> broken BIOS was a pre-production BIOS) "fixed" this by
>> no longer listing an IRQ in the ACPI resources for the TPM.
>>
>> Which means that the new BIOS still being on the deny list
>> does not matter since the IRQ support won't work anyways as
>> we no longer get an IRQ assigned.
>>
>> So I don't think this is necessary and it will just complicate
>> things unnecessarily. This whole saga has already taken way
>> too long to fix. So IMHO the simplest fix where we just deny
>> list the broken models independent of BIOS versions and move
>> on seems best.
>>
>> Regards,
>>
>> Hans
> 
> This worked for me:

That looks good to me, can you submit this upstream please ?

If you Cc me I'll give it my Reviewed-by.

Regards,

Hans


> 
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..abe674d1de6d 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
>   #include <linux/of.h>
>   #include <linux/of_device.h>
>   #include <linux/kernel.h>
> +#include <linux/dmi.h>
>   #include "tpm.h"
>   #include "tpm_tis_core.h"
> 
> @@ -63,6 +64,26 @@ module_param(force, bool, 0444);
>   MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
>   #endif
> 
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> +       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
> +       interrupts = false;
> +
> +       return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> +       {
> +               .callback = tpm_tis_disable_irq,
> +               .ident = "ThinkPad T490s",
> +               .matches = {
> +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
> +               },
> +       },
> +       {}
> +};
> +
>   #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>   static int has_hid(struct acpi_device *dev, const char *hid)
>   {
> @@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
>          int irq = -1;
>          int rc;
> 
> +       dmi_check_system(tpm_tis_dmi_table);
> +
>          rc = check_acpi_tpm2(dev);
>          if (rc)
>                  return rc;
>
James Bottomley Oct. 15, 2020, 3:36 p.m. UTC | #17
On Wed, 2020-10-14 at 13:58 -0700, Jerry Snitselaar wrote:
> Hans de Goede @ 2020-10-14 09:46 MST:
> 
> > Hi,
> > 
> > On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
> > > Hans de Goede @ 2020-10-14 09:04 MST:
> > > 
> > > > Hi,
> > > > 
> > > > On 10/14/20 5:23 PM, James Bottomley wrote:
> > > > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
> > > > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> > > > > > > James Bottomley @ 2020-10-13 08:24 MST:
> > > > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar
> > > > > > > > wrote:
> > > > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> > > > > [...]
> > > > > > > > > >     Jerry, once you have some bandwidth (no rush,
> > > > > > > > > > does not land
> > > > > > > > > > before rc2), it would be great that if you could
> > > > > > > > > > try this.
> > > > > > > > > > I'm emphasizing this just because of the
> > > > > > > > > > intersection. I
> > > > > > > > > > think it would also make senset to get tested-by
> > > > > > > > > > from Nayna.
> > > > > > > > > 
> > > > > > > > > I will run some tests on some other systems I have
> > > > > > > > > access to.
> > > > > > > > > As noted in the other email I did a quick test with a
> > > > > > > > > t490s
> > > > > > > > > with an older bios that exhibits the problem
> > > > > > > > > originally
> > > > > > > > > reported when Stefan's patch enabled interrupts.
> > > > > > > > 
> > > > > > > > Well, it means there's still some other problem.  I was
> > > > > > > > hoping
> > > > > > > > that because the rainbow pass system originally
> > > > > > > > exhibited the
> > > > > > > > same symptoms (interrupt storm) fixing it would also
> > > > > > > > fix the t490
> > > > > > > > and the ineffective EOI bug looked like a great
> > > > > > > > candidate for
> > > > > > > > being the root cause.
> > > > > > > > 
> > > > > > > 
> > > > > > > Adding Hans to the list.
> > > > > > > 
> > > > > > > IIUC in the t490s case the problem lies with the hardware
> > > > > > > itself.
> > > > > > > Hans, is that correct?
> > > > > > 
> > > > > > More or less. AFAIK / have been told by Lenovo it is an
> > > > > > issue with
> > > > > > the configuration of the inerrupt-type of the GPIO pin used
> > > > > > for the
> > > > > > IRQ, which is a firmware issue which could be fixed by a
> > > > > > BIOS update
> > > > > > (the pin is setup as a direct-irq pin for the APIC, so the
> > > > > > OS has no
> > > > > > control of the IRQ type since with APIC irqs this is all
> > > > > > supposed to
> > > > > > be setup properly before hand).
> > > > > > 
> > > > > > But it is a model specific issue, if we denylist IRQ usage
> > > > > > on this
> > > > > > Lenovo model (and probably a few others) then we should be
> > > > > > able to
> > > > > > restore the IRQ code to normal functionality for all other
> > > > > > device
> > > > > > models which declare an IRQ in their resource tables.
> > > > > I can do that with a quirk, but how do I identify the
> > > > > device?  TPM
> > > > > manufacturer and version? or do I have to use something like
> > > > > the ACPI
> > > > > bios version?
> > > > 
> > > > I'm not sure if the TPM ids are unique to one model/series of
> > > > laptops.
> > > > 
> > > > So my idea for this was to match on DMI strings, specifically
> > > > use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
> > > > strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
> > > > devices the string which you expect to be in DMI_PRODUCT_NAME
> > > > is actually in DMI_PRODUCT_VERSION).
> > > > 
> > > > You can easily get the strings for your device by doing:
> > > > 
> > > > cat /sys/class/dmi/id/sys_vendor
> > > > cat /sys/class/dmi/id/product_version
> > > > 
> > > > Regards,
> > > > 
> > > > Hans
> > > Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
> > > if the bios is older than the fixed bios? Has Lenovo
> > > released the fixed bios?
> > 
> > Maybe, the fixed BIOS-es which I have seen (for the X1C8,
> > broken BIOS was a pre-production BIOS) "fixed" this by
> > no longer listing an IRQ in the ACPI resources for the TPM.
> > 
> > Which means that the new BIOS still being on the deny list
> > does not matter since the IRQ support won't work anyways as
> > we no longer get an IRQ assigned.
> > 
> > So I don't think this is necessary and it will just complicate
> > things unnecessarily. This whole saga has already taken way
> > too long to fix. So IMHO the simplest fix where we just deny
> > list the broken models independent of BIOS versions and move
> > on seems best.
> > 
> > Regards,
> > 
> > Hans
> 
> This worked for me:
> 
> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..abe674d1de6d 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
>  #include <linux/of.h>
>  #include <linux/of_device.h>
>  #include <linux/kernel.h>
> +#include <linux/dmi.h>
>  #include "tpm.h"
>  #include "tpm_tis_core.h"
> 
> @@ -63,6 +64,26 @@ module_param(force, bool, 0444);
>  MODULE_PARM_DESC(force, "Force device probe rather than using ACPI
> entry");
>  #endif
> 
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> +       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d-
> >ident);
> +       interrupts = false;
> +
> +       return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> +       {
> +               .callback = tpm_tis_disable_irq,
> +               .ident = "ThinkPad T490s",
> +               .matches = {
> +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
> T490s"),
> +               },
> +       },
> +       {}
> +};
> +
>  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>  static int has_hid(struct acpi_device *dev, const char *hid)
>  {
> @@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev,
> struct tpm_info *tpm_info)
>         int irq = -1;
>         int rc;
> 
> +       dmi_check_system(tpm_tis_dmi_table);
> +
>         rc = check_acpi_tpm2(dev);
>         if (rc)
>                 return rc;

This looks OK to me with the caveat that anyone on one of these systems
has no way to enable interrupts again if they think they have a fixed
bios.  What about making interrupts a tristate with the default value
-1?  Then in the dmi check, if we see -1 we set it to 0 but if we see 1
(the user has specified interrupts=1 on the module insert line) we
leave it?

James
Jerry Snitselaar Oct. 15, 2020, 6:48 p.m. UTC | #18
James Bottomley @ 2020-10-15 08:36 MST:

> On Wed, 2020-10-14 at 13:58 -0700, Jerry Snitselaar wrote:
>> Hans de Goede @ 2020-10-14 09:46 MST:
>> 
>> > Hi,
>> > 
>> > On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>> > > Hans de Goede @ 2020-10-14 09:04 MST:
>> > > 
>> > > > Hi,
>> > > > 
>> > > > On 10/14/20 5:23 PM, James Bottomley wrote:
>> > > > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>> > > > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>> > > > > > > James Bottomley @ 2020-10-13 08:24 MST:
>> > > > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar
>> > > > > > > > wrote:
>> > > > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> > > > > [...]
>> > > > > > > > > >     Jerry, once you have some bandwidth (no rush,
>> > > > > > > > > > does not land
>> > > > > > > > > > before rc2), it would be great that if you could
>> > > > > > > > > > try this.
>> > > > > > > > > > I'm emphasizing this just because of the
>> > > > > > > > > > intersection. I
>> > > > > > > > > > think it would also make senset to get tested-by
>> > > > > > > > > > from Nayna.
>> > > > > > > > > 
>> > > > > > > > > I will run some tests on some other systems I have
>> > > > > > > > > access to.
>> > > > > > > > > As noted in the other email I did a quick test with a
>> > > > > > > > > t490s
>> > > > > > > > > with an older bios that exhibits the problem
>> > > > > > > > > originally
>> > > > > > > > > reported when Stefan's patch enabled interrupts.
>> > > > > > > > 
>> > > > > > > > Well, it means there's still some other problem.  I was
>> > > > > > > > hoping
>> > > > > > > > that because the rainbow pass system originally
>> > > > > > > > exhibited the
>> > > > > > > > same symptoms (interrupt storm) fixing it would also
>> > > > > > > > fix the t490
>> > > > > > > > and the ineffective EOI bug looked like a great
>> > > > > > > > candidate for
>> > > > > > > > being the root cause.
>> > > > > > > > 
>> > > > > > > 
>> > > > > > > Adding Hans to the list.
>> > > > > > > 
>> > > > > > > IIUC in the t490s case the problem lies with the hardware
>> > > > > > > itself.
>> > > > > > > Hans, is that correct?
>> > > > > > 
>> > > > > > More or less. AFAIK / have been told by Lenovo it is an
>> > > > > > issue with
>> > > > > > the configuration of the inerrupt-type of the GPIO pin used
>> > > > > > for the
>> > > > > > IRQ, which is a firmware issue which could be fixed by a
>> > > > > > BIOS update
>> > > > > > (the pin is setup as a direct-irq pin for the APIC, so the
>> > > > > > OS has no
>> > > > > > control of the IRQ type since with APIC irqs this is all
>> > > > > > supposed to
>> > > > > > be setup properly before hand).
>> > > > > > 
>> > > > > > But it is a model specific issue, if we denylist IRQ usage
>> > > > > > on this
>> > > > > > Lenovo model (and probably a few others) then we should be
>> > > > > > able to
>> > > > > > restore the IRQ code to normal functionality for all other
>> > > > > > device
>> > > > > > models which declare an IRQ in their resource tables.
>> > > > > I can do that with a quirk, but how do I identify the
>> > > > > device?  TPM
>> > > > > manufacturer and version? or do I have to use something like
>> > > > > the ACPI
>> > > > > bios version?
>> > > > 
>> > > > I'm not sure if the TPM ids are unique to one model/series of
>> > > > laptops.
>> > > > 
>> > > > So my idea for this was to match on DMI strings, specifically
>> > > > use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
>> > > > strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
>> > > > devices the string which you expect to be in DMI_PRODUCT_NAME
>> > > > is actually in DMI_PRODUCT_VERSION).
>> > > > 
>> > > > You can easily get the strings for your device by doing:
>> > > > 
>> > > > cat /sys/class/dmi/id/sys_vendor
>> > > > cat /sys/class/dmi/id/product_version
>> > > > 
>> > > > Regards,
>> > > > 
>> > > > Hans
>> > > Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>> > > if the bios is older than the fixed bios? Has Lenovo
>> > > released the fixed bios?
>> > 
>> > Maybe, the fixed BIOS-es which I have seen (for the X1C8,
>> > broken BIOS was a pre-production BIOS) "fixed" this by
>> > no longer listing an IRQ in the ACPI resources for the TPM.
>> > 
>> > Which means that the new BIOS still being on the deny list
>> > does not matter since the IRQ support won't work anyways as
>> > we no longer get an IRQ assigned.
>> > 
>> > So I don't think this is necessary and it will just complicate
>> > things unnecessarily. This whole saga has already taken way
>> > too long to fix. So IMHO the simplest fix where we just deny
>> > list the broken models independent of BIOS versions and move
>> > on seems best.
>> > 
>> > Regards,
>> > 
>> > Hans
>> 
>> This worked for me:
>> 
>> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
>> index 0b214963539d..abe674d1de6d 100644
>> --- a/drivers/char/tpm/tpm_tis.c
>> +++ b/drivers/char/tpm/tpm_tis.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_device.h>
>>  #include <linux/kernel.h>
>> +#include <linux/dmi.h>
>>  #include "tpm.h"
>>  #include "tpm_tis_core.h"
>> 
>> @@ -63,6 +64,26 @@ module_param(force, bool, 0444);
>>  MODULE_PARM_DESC(force, "Force device probe rather than using ACPI
>> entry");
>>  #endif
>> 
>> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
>> +{
>> +       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d-
>> >ident);
>> +       interrupts = false;
>> +
>> +       return 0;
>> +}
>> +
>> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
>> +       {
>> +               .callback = tpm_tis_disable_irq,
>> +               .ident = "ThinkPad T490s",
>> +               .matches = {
>> +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>> +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
>> T490s"),
>> +               },
>> +       },
>> +       {}
>> +};
>> +
>>  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>>  static int has_hid(struct acpi_device *dev, const char *hid)
>>  {
>> @@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev,
>> struct tpm_info *tpm_info)
>>         int irq = -1;
>>         int rc;
>> 
>> +       dmi_check_system(tpm_tis_dmi_table);
>> +
>>         rc = check_acpi_tpm2(dev);
>>         if (rc)
>>                 return rc;
>
> This looks OK to me with the caveat that anyone on one of these systems
> has no way to enable interrupts again if they think they have a fixed
> bios.  What about making interrupts a tristate with the default value
> -1?  Then in the dmi check, if we see -1 we set it to 0 but if we see 1
> (the user has specified interrupts=1 on the module insert line) we
> leave it?
>
> James

like this?

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 0b214963539d..10c46cb26c5a 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -27,6 +27,7 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/kernel.h>
+#include <linux/dmi.h>
 #include "tpm.h"
 #include "tpm_tis_core.h"

@@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
        return container_of(data, struct tpm_tis_tcg_phy, priv);
 }

-static bool interrupts = true;
-module_param(interrupts, bool, 0444);
+static int interrupts = -1;
+module_param(interrupts, int, 0444);
 MODULE_PARM_DESC(interrupts, "Enable interrupts");

 static bool itpm;
@@ -63,6 +64,27 @@ module_param(force, bool, 0444);
 MODULE_PARM_DESC(force, "Force device probe rather than using ACPI entry");
 #endif

+static int tpm_tis_disable_irq(const struct dmi_system_id *d)
+{
+       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d->ident);
+       if (interrupts == -1)
+               interrupts = 0;
+
+       return 0;
+}
+
+static const struct dmi_system_id tpm_tis_dmi_table[] = {
+       {
+               .callback = tpm_tis_disable_irq,
+               .ident = "ThinkPad T490s",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T490s"),
+               },
+       },
+       {}
+};
+
 #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
 static int has_hid(struct acpi_device *dev, const char *hid)
 {
@@ -192,6 +214,8 @@ static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info)
        int irq = -1;
        int rc;

+       dmi_check_system(tpm_tis_dmi_table);
+
        rc = check_acpi_tpm2(dev);
        if (rc)
                return rc;
James Bottomley Oct. 15, 2020, 6:57 p.m. UTC | #19
On Thu, 2020-10-15 at 11:48 -0700, Jerry Snitselaar wrote:
> James Bottomley @ 2020-10-15 08:36 MST:
> 
> > On Wed, 2020-10-14 at 13:58 -0700, Jerry Snitselaar wrote:
> > > Hans de Goede @ 2020-10-14 09:46 MST:
> > > 
> > > > Hi,
> > > > 
> > > > On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
> > > > > Hans de Goede @ 2020-10-14 09:04 MST:
> > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > > On 10/14/20 5:23 PM, James Bottomley wrote:
> > > > > > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
> > > > > > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> > > > > > > > > James Bottomley @ 2020-10-13 08:24 MST:
> > > > > > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar
> > > > > > > > > > wrote:
> > > > > > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> > > > > > > [...]
> > > > > > > > > > > >     Jerry, once you have some bandwidth (no
> > > > > > > > > > > > rush,
> > > > > > > > > > > > does not land
> > > > > > > > > > > > before rc2), it would be great that if you
> > > > > > > > > > > > could
> > > > > > > > > > > > try this.
> > > > > > > > > > > > I'm emphasizing this just because of the
> > > > > > > > > > > > intersection. I
> > > > > > > > > > > > think it would also make senset to get tested-
> > > > > > > > > > > > by
> > > > > > > > > > > > from Nayna.
> > > > > > > > > > > 
> > > > > > > > > > > I will run some tests on some other systems I
> > > > > > > > > > > have
> > > > > > > > > > > access to.
> > > > > > > > > > > As noted in the other email I did a quick test
> > > > > > > > > > > with a
> > > > > > > > > > > t490s
> > > > > > > > > > > with an older bios that exhibits the problem
> > > > > > > > > > > originally
> > > > > > > > > > > reported when Stefan's patch enabled interrupts.
> > > > > > > > > > 
> > > > > > > > > > Well, it means there's still some other problem.  I
> > > > > > > > > > was
> > > > > > > > > > hoping
> > > > > > > > > > that because the rainbow pass system originally
> > > > > > > > > > exhibited the
> > > > > > > > > > same symptoms (interrupt storm) fixing it would
> > > > > > > > > > also
> > > > > > > > > > fix the t490
> > > > > > > > > > and the ineffective EOI bug looked like a great
> > > > > > > > > > candidate for
> > > > > > > > > > being the root cause.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Adding Hans to the list.
> > > > > > > > > 
> > > > > > > > > IIUC in the t490s case the problem lies with the
> > > > > > > > > hardware
> > > > > > > > > itself.
> > > > > > > > > Hans, is that correct?
> > > > > > > > 
> > > > > > > > More or less. AFAIK / have been told by Lenovo it is an
> > > > > > > > issue with
> > > > > > > > the configuration of the inerrupt-type of the GPIO pin
> > > > > > > > used
> > > > > > > > for the
> > > > > > > > IRQ, which is a firmware issue which could be fixed by
> > > > > > > > a
> > > > > > > > BIOS update
> > > > > > > > (the pin is setup as a direct-irq pin for the APIC, so
> > > > > > > > the
> > > > > > > > OS has no
> > > > > > > > control of the IRQ type since with APIC irqs this is
> > > > > > > > all
> > > > > > > > supposed to
> > > > > > > > be setup properly before hand).
> > > > > > > > 
> > > > > > > > But it is a model specific issue, if we denylist IRQ
> > > > > > > > usage
> > > > > > > > on this
> > > > > > > > Lenovo model (and probably a few others) then we should
> > > > > > > > be
> > > > > > > > able to
> > > > > > > > restore the IRQ code to normal functionality for all
> > > > > > > > other
> > > > > > > > device
> > > > > > > > models which declare an IRQ in their resource tables.
> > > > > > > I can do that with a quirk, but how do I identify the
> > > > > > > device?  TPM
> > > > > > > manufacturer and version? or do I have to use something
> > > > > > > like
> > > > > > > the ACPI
> > > > > > > bios version?
> > > > > > 
> > > > > > I'm not sure if the TPM ids are unique to one model/series
> > > > > > of
> > > > > > laptops.
> > > > > > 
> > > > > > So my idea for this was to match on DMI strings,
> > > > > > specifically
> > > > > > use a DMI match on the DMI_SYS_VENDOR and
> > > > > > DMI_PRODUCT_VERSION
> > > > > > strings (normally one would use DMI_PRODUCT_NAME but for
> > > > > > Lenovo
> > > > > > devices the string which you expect to be in
> > > > > > DMI_PRODUCT_NAME
> > > > > > is actually in DMI_PRODUCT_VERSION).
> > > > > > 
> > > > > > You can easily get the strings for your device by doing:
> > > > > > 
> > > > > > cat /sys/class/dmi/id/sys_vendor
> > > > > > cat /sys/class/dmi/id/product_version
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Hans
> > > > > Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
> > > > > if the bios is older than the fixed bios? Has Lenovo
> > > > > released the fixed bios?
> > > > 
> > > > Maybe, the fixed BIOS-es which I have seen (for the X1C8,
> > > > broken BIOS was a pre-production BIOS) "fixed" this by
> > > > no longer listing an IRQ in the ACPI resources for the TPM.
> > > > 
> > > > Which means that the new BIOS still being on the deny list
> > > > does not matter since the IRQ support won't work anyways as
> > > > we no longer get an IRQ assigned.
> > > > 
> > > > So I don't think this is necessary and it will just complicate
> > > > things unnecessarily. This whole saga has already taken way
> > > > too long to fix. So IMHO the simplest fix where we just deny
> > > > list the broken models independent of BIOS versions and move
> > > > on seems best.
> > > > 
> > > > Regards,
> > > > 
> > > > Hans
> > > 
> > > This worked for me:
> > > 
> > > diff --git a/drivers/char/tpm/tpm_tis.c
> > > b/drivers/char/tpm/tpm_tis.c
> > > index 0b214963539d..abe674d1de6d 100644
> > > --- a/drivers/char/tpm/tpm_tis.c
> > > +++ b/drivers/char/tpm/tpm_tis.c
> > > @@ -27,6 +27,7 @@
> > >  #include <linux/of.h>
> > >  #include <linux/of_device.h>
> > >  #include <linux/kernel.h>
> > > +#include <linux/dmi.h>
> > >  #include "tpm.h"
> > >  #include "tpm_tis_core.h"
> > > 
> > > @@ -63,6 +64,26 @@ module_param(force, bool, 0444);
> > >  MODULE_PARM_DESC(force, "Force device probe rather than using
> > > ACPI
> > > entry");
> > >  #endif
> > > 
> > > +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> > > +{
> > > +       pr_notice("tpm_tis: %s detected: disabling
> > > interrupts.\n", d-
> > > > ident);
> > > +       interrupts = false;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> > > +       {
> > > +               .callback = tpm_tis_disable_irq,
> > > +               .ident = "ThinkPad T490s",
> > > +               .matches = {
> > > +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> > > +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
> > > T490s"),
> > > +               },
> > > +       },
> > > +       {}
> > > +};
> > > +
> > >  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
> > >  static int has_hid(struct acpi_device *dev, const char *hid)
> > >  {
> > > @@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev,
> > > struct tpm_info *tpm_info)
> > >         int irq = -1;
> > >         int rc;
> > > 
> > > +       dmi_check_system(tpm_tis_dmi_table);
> > > +
> > >         rc = check_acpi_tpm2(dev);
> > >         if (rc)
> > >                 return rc;
> > 
> > This looks OK to me with the caveat that anyone on one of these
> > systems
> > has no way to enable interrupts again if they think they have a
> > fixed
> > bios.  What about making interrupts a tristate with the default
> > value
> > -1?  Then in the dmi check, if we see -1 we set it to 0 but if we
> > see 1
> > (the user has specified interrupts=1 on the module insert line) we
> > leave it?
> > 
> > James
> 
> like this?

Yes, that works, I think.

Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>

James


> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
> index 0b214963539d..10c46cb26c5a 100644
> --- a/drivers/char/tpm/tpm_tis.c
> +++ b/drivers/char/tpm/tpm_tis.c
> @@ -27,6 +27,7 @@
>  #include <linux/of.h>
>  #include <linux/of_device.h>
>  #include <linux/kernel.h>
> +#include <linux/dmi.h>
>  #include "tpm.h"
>  #include "tpm_tis_core.h"
> 
> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy
> *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
>         return container_of(data, struct tpm_tis_tcg_phy, priv);
>  }
> 
> -static bool interrupts = true;
> -module_param(interrupts, bool, 0444);
> +static int interrupts = -1;
> +module_param(interrupts, int, 0444);
>  MODULE_PARM_DESC(interrupts, "Enable interrupts");
> 
>  static bool itpm;
> @@ -63,6 +64,27 @@ module_param(force, bool, 0444);
>  MODULE_PARM_DESC(force, "Force device probe rather than using ACPI
> entry");
>  #endif
> 
> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
> +{
> +       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d-
> >ident);
> +       if (interrupts == -1)
> +               interrupts = 0;
> +
> +       return 0;
> +}
> +
> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
> +       {
> +               .callback = tpm_tis_disable_irq,
> +               .ident = "ThinkPad T490s",
> +               .matches = {
> +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
> T490s"),
> +               },
> +       },
> +       {}
> +};
> +
>  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>  static int has_hid(struct acpi_device *dev, const char *hid)
>  {
> @@ -192,6 +214,8 @@ static int tpm_tis_init(struct device *dev,
> struct tpm_info *tpm_info)
>         int irq = -1;
>         int rc;
> 
> +       dmi_check_system(tpm_tis_dmi_table);
> +
>         rc = check_acpi_tpm2(dev);
>         if (rc)
>                 return rc;
>
Jerry Snitselaar Oct. 15, 2020, 7:16 p.m. UTC | #20
James Bottomley @ 2020-10-15 11:57 MST:

> On Thu, 2020-10-15 at 11:48 -0700, Jerry Snitselaar wrote:
>> James Bottomley @ 2020-10-15 08:36 MST:
>> 
>> > On Wed, 2020-10-14 at 13:58 -0700, Jerry Snitselaar wrote:
>> > > Hans de Goede @ 2020-10-14 09:46 MST:
>> > > 
>> > > > Hi,
>> > > > 
>> > > > On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
>> > > > > Hans de Goede @ 2020-10-14 09:04 MST:
>> > > > > 
>> > > > > > Hi,
>> > > > > > 
>> > > > > > On 10/14/20 5:23 PM, James Bottomley wrote:
>> > > > > > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
>> > > > > > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
>> > > > > > > > > James Bottomley @ 2020-10-13 08:24 MST:
>> > > > > > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar
>> > > > > > > > > > wrote:
>> > > > > > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> > > > > > > [...]
>> > > > > > > > > > > >     Jerry, once you have some bandwidth (no
>> > > > > > > > > > > > rush,
>> > > > > > > > > > > > does not land
>> > > > > > > > > > > > before rc2), it would be great that if you
>> > > > > > > > > > > > could
>> > > > > > > > > > > > try this.
>> > > > > > > > > > > > I'm emphasizing this just because of the
>> > > > > > > > > > > > intersection. I
>> > > > > > > > > > > > think it would also make senset to get tested-
>> > > > > > > > > > > > by
>> > > > > > > > > > > > from Nayna.
>> > > > > > > > > > > 
>> > > > > > > > > > > I will run some tests on some other systems I
>> > > > > > > > > > > have
>> > > > > > > > > > > access to.
>> > > > > > > > > > > As noted in the other email I did a quick test
>> > > > > > > > > > > with a
>> > > > > > > > > > > t490s
>> > > > > > > > > > > with an older bios that exhibits the problem
>> > > > > > > > > > > originally
>> > > > > > > > > > > reported when Stefan's patch enabled interrupts.
>> > > > > > > > > > 
>> > > > > > > > > > Well, it means there's still some other problem.  I
>> > > > > > > > > > was
>> > > > > > > > > > hoping
>> > > > > > > > > > that because the rainbow pass system originally
>> > > > > > > > > > exhibited the
>> > > > > > > > > > same symptoms (interrupt storm) fixing it would
>> > > > > > > > > > also
>> > > > > > > > > > fix the t490
>> > > > > > > > > > and the ineffective EOI bug looked like a great
>> > > > > > > > > > candidate for
>> > > > > > > > > > being the root cause.
>> > > > > > > > > > 
>> > > > > > > > > 
>> > > > > > > > > Adding Hans to the list.
>> > > > > > > > > 
>> > > > > > > > > IIUC in the t490s case the problem lies with the
>> > > > > > > > > hardware
>> > > > > > > > > itself.
>> > > > > > > > > Hans, is that correct?
>> > > > > > > > 
>> > > > > > > > More or less. AFAIK / have been told by Lenovo it is an
>> > > > > > > > issue with
>> > > > > > > > the configuration of the inerrupt-type of the GPIO pin
>> > > > > > > > used
>> > > > > > > > for the
>> > > > > > > > IRQ, which is a firmware issue which could be fixed by
>> > > > > > > > a
>> > > > > > > > BIOS update
>> > > > > > > > (the pin is setup as a direct-irq pin for the APIC, so
>> > > > > > > > the
>> > > > > > > > OS has no
>> > > > > > > > control of the IRQ type since with APIC irqs this is
>> > > > > > > > all
>> > > > > > > > supposed to
>> > > > > > > > be setup properly before hand).
>> > > > > > > > 
>> > > > > > > > But it is a model specific issue, if we denylist IRQ
>> > > > > > > > usage
>> > > > > > > > on this
>> > > > > > > > Lenovo model (and probably a few others) then we should
>> > > > > > > > be
>> > > > > > > > able to
>> > > > > > > > restore the IRQ code to normal functionality for all
>> > > > > > > > other
>> > > > > > > > device
>> > > > > > > > models which declare an IRQ in their resource tables.
>> > > > > > > I can do that with a quirk, but how do I identify the
>> > > > > > > device?  TPM
>> > > > > > > manufacturer and version? or do I have to use something
>> > > > > > > like
>> > > > > > > the ACPI
>> > > > > > > bios version?
>> > > > > > 
>> > > > > > I'm not sure if the TPM ids are unique to one model/series
>> > > > > > of
>> > > > > > laptops.
>> > > > > > 
>> > > > > > So my idea for this was to match on DMI strings,
>> > > > > > specifically
>> > > > > > use a DMI match on the DMI_SYS_VENDOR and
>> > > > > > DMI_PRODUCT_VERSION
>> > > > > > strings (normally one would use DMI_PRODUCT_NAME but for
>> > > > > > Lenovo
>> > > > > > devices the string which you expect to be in
>> > > > > > DMI_PRODUCT_NAME
>> > > > > > is actually in DMI_PRODUCT_VERSION).
>> > > > > > 
>> > > > > > You can easily get the strings for your device by doing:
>> > > > > > 
>> > > > > > cat /sys/class/dmi/id/sys_vendor
>> > > > > > cat /sys/class/dmi/id/product_version
>> > > > > > 
>> > > > > > Regards,
>> > > > > > 
>> > > > > > Hans
>> > > > > Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
>> > > > > if the bios is older than the fixed bios? Has Lenovo
>> > > > > released the fixed bios?
>> > > > 
>> > > > Maybe, the fixed BIOS-es which I have seen (for the X1C8,
>> > > > broken BIOS was a pre-production BIOS) "fixed" this by
>> > > > no longer listing an IRQ in the ACPI resources for the TPM.
>> > > > 
>> > > > Which means that the new BIOS still being on the deny list
>> > > > does not matter since the IRQ support won't work anyways as
>> > > > we no longer get an IRQ assigned.
>> > > > 
>> > > > So I don't think this is necessary and it will just complicate
>> > > > things unnecessarily. This whole saga has already taken way
>> > > > too long to fix. So IMHO the simplest fix where we just deny
>> > > > list the broken models independent of BIOS versions and move
>> > > > on seems best.
>> > > > 
>> > > > Regards,
>> > > > 
>> > > > Hans
>> > > 
>> > > This worked for me:
>> > > 
>> > > diff --git a/drivers/char/tpm/tpm_tis.c
>> > > b/drivers/char/tpm/tpm_tis.c
>> > > index 0b214963539d..abe674d1de6d 100644
>> > > --- a/drivers/char/tpm/tpm_tis.c
>> > > +++ b/drivers/char/tpm/tpm_tis.c
>> > > @@ -27,6 +27,7 @@
>> > >  #include <linux/of.h>
>> > >  #include <linux/of_device.h>
>> > >  #include <linux/kernel.h>
>> > > +#include <linux/dmi.h>
>> > >  #include "tpm.h"
>> > >  #include "tpm_tis_core.h"
>> > > 
>> > > @@ -63,6 +64,26 @@ module_param(force, bool, 0444);
>> > >  MODULE_PARM_DESC(force, "Force device probe rather than using
>> > > ACPI
>> > > entry");
>> > >  #endif
>> > > 
>> > > +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
>> > > +{
>> > > +       pr_notice("tpm_tis: %s detected: disabling
>> > > interrupts.\n", d-
>> > > > ident);
>> > > +       interrupts = false;
>> > > +
>> > > +       return 0;
>> > > +}
>> > > +
>> > > +static const struct dmi_system_id tpm_tis_dmi_table[] = {
>> > > +       {
>> > > +               .callback = tpm_tis_disable_irq,
>> > > +               .ident = "ThinkPad T490s",
>> > > +               .matches = {
>> > > +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>> > > +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
>> > > T490s"),
>> > > +               },
>> > > +       },
>> > > +       {}
>> > > +};
>> > > +
>> > >  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>> > >  static int has_hid(struct acpi_device *dev, const char *hid)
>> > >  {
>> > > @@ -192,6 +213,8 @@ static int tpm_tis_init(struct device *dev,
>> > > struct tpm_info *tpm_info)
>> > >         int irq = -1;
>> > >         int rc;
>> > > 
>> > > +       dmi_check_system(tpm_tis_dmi_table);
>> > > +
>> > >         rc = check_acpi_tpm2(dev);
>> > >         if (rc)
>> > >                 return rc;
>> > 
>> > This looks OK to me with the caveat that anyone on one of these
>> > systems
>> > has no way to enable interrupts again if they think they have a
>> > fixed
>> > bios.  What about making interrupts a tristate with the default
>> > value
>> > -1?  Then in the dmi check, if we see -1 we set it to 0 but if we
>> > see 1
>> > (the user has specified interrupts=1 on the module insert line) we
>> > leave it?
>> > 
>> > James
>> 
>> like this?
>
> Yes, that works, I think.
>
> Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>
> James
>

I'll send a real patch in a just a bit. The pr_notice needs to move into the
if block, otherwise it will log that it is disabling interrupts when the
user has forced enabling.

Regards,
Jerry

>
>> diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
>> index 0b214963539d..10c46cb26c5a 100644
>> --- a/drivers/char/tpm/tpm_tis.c
>> +++ b/drivers/char/tpm/tpm_tis.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/of.h>
>>  #include <linux/of_device.h>
>>  #include <linux/kernel.h>
>> +#include <linux/dmi.h>
>>  #include "tpm.h"
>>  #include "tpm_tis_core.h"
>> 
>> @@ -49,8 +50,8 @@ static inline struct tpm_tis_tcg_phy
>> *to_tpm_tis_tcg_phy(struct tpm_tis_data *da
>>         return container_of(data, struct tpm_tis_tcg_phy, priv);
>>  }
>> 
>> -static bool interrupts = true;
>> -module_param(interrupts, bool, 0444);
>> +static int interrupts = -1;
>> +module_param(interrupts, int, 0444);
>>  MODULE_PARM_DESC(interrupts, "Enable interrupts");
>> 
>>  static bool itpm;
>> @@ -63,6 +64,27 @@ module_param(force, bool, 0444);
>>  MODULE_PARM_DESC(force, "Force device probe rather than using ACPI
>> entry");
>>  #endif
>> 
>> +static int tpm_tis_disable_irq(const struct dmi_system_id *d)
>> +{
>> +       pr_notice("tpm_tis: %s detected: disabling interrupts.\n", d-
>> >ident);
>> +       if (interrupts == -1)
>> +               interrupts = 0;
>> +
>> +       return 0;
>> +}
>> +
>> +static const struct dmi_system_id tpm_tis_dmi_table[] = {
>> +       {
>> +               .callback = tpm_tis_disable_irq,
>> +               .ident = "ThinkPad T490s",
>> +               .matches = {
>> +                       DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>> +                       DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad
>> T490s"),
>> +               },
>> +       },
>> +       {}
>> +};
>> +
>>  #if defined(CONFIG_PNP) && defined(CONFIG_ACPI)
>>  static int has_hid(struct acpi_device *dev, const char *hid)
>>  {
>> @@ -192,6 +214,8 @@ static int tpm_tis_init(struct device *dev,
>> struct tpm_info *tpm_info)
>>         int irq = -1;
>>         int rc;
>> 
>> +       dmi_check_system(tpm_tis_dmi_table);
>> +
>>         rc = check_acpi_tpm2(dev);
>>         if (rc)
>>                 return rc;
>>
Jarkko Sakkinen Oct. 18, 2020, 5:34 a.m. UTC | #21
On Tue, Oct 13, 2020 at 04:23:54AM +0300, Jarkko Sakkinen wrote:
> On Sun, Oct 11, 2020 at 10:39:07PM -0700, Jerry Snitselaar wrote:
> > 
> > James Bottomley @ 2020-10-01 11:09 MST:
> > 
> > > The current state of the TIS TPM is that interrupts have been globally
> > > disabled by various changes.  The problems we got reported the last
> > > time they were enabled was interrupt storms.  With my own TIS TPM,
> > > I've found that this is caused because my TPM doesn't do legacy
> > > cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> > > requires any TIS TPM without legacy cycles not to act on any write to
> > > an interrupt register unless the locality is enabled.  This means if
> > > an interrupt fires after we relinquish the locality, the TPM_EOI in
> > > the interrupt routine is ineffective meaning the same interrupt
> > > triggers over and over again.  This problem also means we can have
> > > trouble setting up interrupts on TIS TPMs because the current init
> > > code does the setup before the locality is claimed for the first time.
> > >
> > > James
> > >
> > 
> > 
> > I tested initially with the following commits reverted:
> > 
> > aa4a63dd9816 tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's" | 2020-01-06 | (Stefan Berger)
> > dda8b2af395b tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts" | 2020-01-06 | (Stefan Berger)
> > 
> > The laptop doesn't become unusable, but I lose the trackpad and get the following:
> > 
> > [    3.070501] irq 31: nobody cared (try booting with the "irqpoll" option)
> > [    3.070504] CPU: 2 PID: 251 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
> > [    3.070504] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
> > [    3.070505] Call Trace:
> > [    3.070511]  dump_stack+0x6b/0x88
> > [    3.070514]  __report_bad_irq+0x35/0xa7
> > [    3.070516]  note_interrupt.cold+0xb/0x6a
> > [    3.070518]  handle_irq_event+0x88/0x8a
> > [    3.070519]  handle_fasteoi_irq+0x78/0x1c0
> > [    3.070522]  common_interrupt+0x68/0x140
> > [    3.070524]  ? asm_common_interrupt+0x8/0x40
> > [    3.070525]  asm_common_interrupt+0x1e/0x40
> > [    3.070527] RIP: 0033:0x7f2eea3249b5
> > [    3.070529] Code: e0 48 c1 e8 3c 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 37 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 <48> 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8
> > [    3.070529] RSP: 002b:00007f2ee3ffebc0 EFLAGS: 00000202
> > [    3.070530] RAX: 0000000000000001 RBX: 000000000000000a RCX: 000000000000002e
> > [    3.070531] RDX: 0463400000000000 RSI: 0000000000000000 RDI: 0000000000000001
> > [    3.070532] RBP: 00007f2ee3ffec10 R08: 0000000000000000 R09: 0000000000000035
> > [    3.070532] R10: 00007ffde716f080 R11: 00007ffde716f080 R12: 00007f2edc004c00
> > [    3.070533] R13: 00007ffde700a54f R14: 00007f2ee3ffed10 R15: 00005598a41e3680
> > [    3.070534] handlers:
> > [    3.070537] [<000000007b6f3232>] tis_int_handler
> > [    3.070538] Disabling IRQ #31
> > ...
> > [   14.956342] irq 16: nobody cared (try booting with the "irqpoll" option)
> > [   14.956344] CPU: 0 PID: 1013 Comm: rngd Not tainted 5.8.13-201.local.fc32.x86_64 #1
> > [   14.956344] Hardware name: LENOVO 20NYS7K90F/20NYS7K90F, BIOS N2JET83W (1.61 ) 11/22/2019
> > [   14.956345] Call Trace:
> > [   14.956350]  dump_stack+0x6b/0x88
> > [   14.956353]  __report_bad_irq+0x35/0xa7
> > [   14.956354]  note_interrupt.cold+0xb/0x6a
> > [   14.956355]  handle_irq_event+0x88/0x8a
> > [   14.956356]  handle_fasteoi_irq+0x78/0x1c0
> > [   14.956358]  common_interrupt+0x68/0x140
> > [   14.956360]  ? asm_common_interrupt+0x8/0x40
> > [   14.956361]  asm_common_interrupt+0x1e/0x40
> > [   14.956362] RIP: 0033:0x7fcaff4399d3
> > [   14.956363] Code: e0 48 c1 e8 1e 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 1b 83 e0 01 48 31 45 f0 48 8b 45 e0 48 c1 e8 16 83 e0 01 48 31 45 f0 <48> d1 65 e0 48 8b 45 f0 48 31 45 e0 83 45 d4 01 83 7d d4 40 0f 86
> > [   14.956364] RSP: 002b:00007fcafe544bc0 EFLAGS: 00000246
> > [   14.956364] RAX: 0000000000000000 RBX: 000000000000000a RCX: 0000000000000026
> > [   14.956365] RDX: 000df20000000000 RSI: 0000000000000000 RDI: 0000000000000001
> > [   14.956365] RBP: 00007fcafe544c10 R08: 0000000000000000 R09: 0000000000000035
> > [   14.956366] R10: 00007ffc30bd2080 R11: 00007ffc30bd2080 R12: 00007fcaf0004c00
> > [   14.956366] R13: 00007fcaf0004c00 R14: 00007fcaf0000b60 R15: 0000560216fcf0f2
> > [   14.956367] handlers:
> > [   14.956370] [<0000000024c0571e>] i801_isr [i2c_i801]
> > [   14.956371] Disabling IRQ #16
> > 
> > /proc/interrupts shows:
> > 
> >   16:     100000          0          0          0          0          0          0          0  IR-IO-APIC   16-fasteoi   i801_smbus
> >   31:          0          0     100000          0          0          0          0          0  IR-IO-APIC   31-fasteoi   tpm0
> > 
> > 
> > I also get this behavior with your patchset applied. If I boot with
> > tpm_tis.interrupts=0 it behaves.  The thought from Hans is to look at
> > the dmi info and key off the vendor being Lenovo and bios date to
> > decide if interrupts should be disabled. Since Dan ran also into this
> > with something internal at Intel I don't know if that will be
> > sufficient.
> > 
> > I hope to look over this patchset tomorrow.
> > 
> > Regards,
> > Jerry
> 
> I'm sorry, I received this email only after pulling the latest with fdm.

So, this is the main blocker to take these patches.

/Jarkko
Jarkko Sakkinen Oct. 18, 2020, 9:05 p.m. UTC | #22
On Tue, Oct 13, 2020 at 08:15:36AM -0700, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> 
> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
> >> The current state of the TIS TPM is that interrupts have been globally
> >> disabled by various changes.  The problems we got reported the last
> >> time they were enabled was interrupt storms.  With my own TIS TPM,
> >> I've found that this is caused because my TPM doesn't do legacy
> >> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> >> requires any TIS TPM without legacy cycles not to act on any write to
> >> an interrupt register unless the locality is enabled.  This means if
> >> an interrupt fires after we relinquish the locality, the TPM_EOI in
> >> the interrupt routine is ineffective meaning the same interrupt
> >> triggers over and over again.  This problem also means we can have
> >> trouble setting up interrupts on TIS TPMs because the current init
> >> code does the setup before the locality is claimed for the first time.
> >> 
> >> James
> >
> > You should consider expanding the audience. Jerry, once you have some
> > bandwidth (no rush, does not land before rc2), it would be great that if
> > you could try this. I'm emphasizing this just because of the
> > intersection. I think it would also make senset to get tested-by from
> > Nayna.
> 
> I will run some tests on some other systems I have access to. As noted
> in the other email I did a quick test with a t490s with an older bios
> that exhibits the problem originally reported when Stefan's patch
> enabled interrupts.

Thank you. As said, I can make a pull request to rc2 or even rc3, if
needed.

/Jarkko
Jarkko Sakkinen Oct. 18, 2020, 9:09 p.m. UTC | #23
On Thu, Oct 15, 2020 at 09:38:24AM +0200, Hans de Goede wrote:
> Hi,
> 
> On 10/14/20 10:58 PM, Jerry Snitselaar wrote:
> > 
> > Hans de Goede @ 2020-10-14 09:46 MST:
> > 
> > > Hi,
> > > 
> > > On 10/14/20 6:34 PM, Jerry Snitselaar wrote:
> > > > Hans de Goede @ 2020-10-14 09:04 MST:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > On 10/14/20 5:23 PM, James Bottomley wrote:
> > > > > > On Wed, 2020-10-14 at 17:03 +0200, Hans de Goede wrote:
> > > > > > > On 10/13/20 6:05 PM, Jerry Snitselaar wrote:
> > > > > > > > James Bottomley @ 2020-10-13 08:24 MST:
> > > > > > > > > On Tue, 2020-10-13 at 08:15 -0700, Jerry Snitselaar wrote:
> > > > > > > > > > Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> > > > > > [...]
> > > > > > > > > > >      Jerry, once you have some bandwidth (no rush, does not land
> > > > > > > > > > > before rc2), it would be great that if you could try this.
> > > > > > > > > > > I'm emphasizing this just because of the intersection. I
> > > > > > > > > > > think it would also make senset to get tested-by from Nayna.
> > > > > > > > > > 
> > > > > > > > > > I will run some tests on some other systems I have access to.
> > > > > > > > > > As noted in the other email I did a quick test with a t490s
> > > > > > > > > > with an older bios that exhibits the problem originally
> > > > > > > > > > reported when Stefan's patch enabled interrupts.
> > > > > > > > > 
> > > > > > > > > Well, it means there's still some other problem.  I was hoping
> > > > > > > > > that because the rainbow pass system originally exhibited the
> > > > > > > > > same symptoms (interrupt storm) fixing it would also fix the t490
> > > > > > > > > and the ineffective EOI bug looked like a great candidate for
> > > > > > > > > being the root cause.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Adding Hans to the list.
> > > > > > > > 
> > > > > > > > IIUC in the t490s case the problem lies with the hardware itself.
> > > > > > > > Hans, is that correct?
> > > > > > > 
> > > > > > > More or less. AFAIK / have been told by Lenovo it is an issue with
> > > > > > > the configuration of the inerrupt-type of the GPIO pin used for the
> > > > > > > IRQ, which is a firmware issue which could be fixed by a BIOS update
> > > > > > > (the pin is setup as a direct-irq pin for the APIC, so the OS has no
> > > > > > > control of the IRQ type since with APIC irqs this is all supposed to
> > > > > > > be setup properly before hand).
> > > > > > > 
> > > > > > > But it is a model specific issue, if we denylist IRQ usage on this
> > > > > > > Lenovo model (and probably a few others) then we should be able to
> > > > > > > restore the IRQ code to normal functionality for all other device
> > > > > > > models which declare an IRQ in their resource tables.
> > > > > > I can do that with a quirk, but how do I identify the device?  TPM
> > > > > > manufacturer and version? or do I have to use something like the ACPI
> > > > > > bios version?
> > > > > 
> > > > > I'm not sure if the TPM ids are unique to one model/series of laptops.
> > > > > 
> > > > > So my idea for this was to match on DMI strings, specifically
> > > > > use a DMI match on the DMI_SYS_VENDOR and DMI_PRODUCT_VERSION
> > > > > strings (normally one would use DMI_PRODUCT_NAME but for Lenovo
> > > > > devices the string which you expect to be in DMI_PRODUCT_NAME
> > > > > is actually in DMI_PRODUCT_VERSION).
> > > > > 
> > > > > You can easily get the strings for your device by doing:
> > > > > 
> > > > > cat /sys/class/dmi/id/sys_vendor
> > > > > cat /sys/class/dmi/id/product_version
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Hans
> > > > Plus use dmi_get_date(DMI_BIOS_DATE,...) to check
> > > > if the bios is older than the fixed bios? Has Lenovo
> > > > released the fixed bios?
> > > 
> > > Maybe, the fixed BIOS-es which I have seen (for the X1C8,
> > > broken BIOS was a pre-production BIOS) "fixed" this by
> > > no longer listing an IRQ in the ACPI resources for the TPM.
> > > 
> > > Which means that the new BIOS still being on the deny list
> > > does not matter since the IRQ support won't work anyways as
> > > we no longer get an IRQ assigned.
> > > 
> > > So I don't think this is necessary and it will just complicate
> > > things unnecessarily. This whole saga has already taken way
> > > too long to fix. So IMHO the simplest fix where we just deny
> > > list the broken models independent of BIOS versions and move
> > > on seems best.
> > > 
> > > Regards,
> > > 
> > > Hans
> > 
> > This worked for me:
> 
> That looks good to me, can you submit this upstream please ?
> 
> If you Cc me I'll give it my Reviewed-by.
> 
> Regards,
> 
> Hans

Yes, this does look like sustainable patch.

PS. Email has been migration ongoing. That's the reason for inconsistent
responses.

/Jarkko
Jerry Snitselaar Oct. 20, 2020, 11:10 p.m. UTC | #24
Jarkko Sakkinen @ 2020-10-18 14:05 MST:

> On Tue, Oct 13, 2020 at 08:15:36AM -0700, Jerry Snitselaar wrote:
>> 
>> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> 
>> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
>> >> The current state of the TIS TPM is that interrupts have been globally
>> >> disabled by various changes.  The problems we got reported the last
>> >> time they were enabled was interrupt storms.  With my own TIS TPM,
>> >> I've found that this is caused because my TPM doesn't do legacy
>> >> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
>> >> requires any TIS TPM without legacy cycles not to act on any write to
>> >> an interrupt register unless the locality is enabled.  This means if
>> >> an interrupt fires after we relinquish the locality, the TPM_EOI in
>> >> the interrupt routine is ineffective meaning the same interrupt
>> >> triggers over and over again.  This problem also means we can have
>> >> trouble setting up interrupts on TIS TPMs because the current init
>> >> code does the setup before the locality is claimed for the first time.
>> >> 
>> >> James
>> >
>> > You should consider expanding the audience. Jerry, once you have some
>> > bandwidth (no rush, does not land before rc2), it would be great that if
>> > you could try this. I'm emphasizing this just because of the
>> > intersection. I think it would also make senset to get tested-by from
>> > Nayna.
>> 
>> I will run some tests on some other systems I have access to. As noted
>> in the other email I did a quick test with a t490s with an older bios
>> that exhibits the problem originally reported when Stefan's patch
>> enabled interrupts.
>
> Thank you. As said, I can make a pull request to rc2 or even rc3, if
> needed.
>
> /Jarkko

So outside of the t490s I have access to, it looks like the nuc5 with
tpm2.0 device, and and older lenovo D30 with a tpm1.2 device both are
not using interrupts. I'm digging around to see if I can find some
other systems that I can test interrupts on.

Regards,
Jerry
Jarkko Sakkinen Oct. 24, 2020, 12:20 p.m. UTC | #25
On Tue, Oct 20, 2020 at 04:10:42PM -0700, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-10-18 14:05 MST:
> 
> > On Tue, Oct 13, 2020 at 08:15:36AM -0700, Jerry Snitselaar wrote:
> >> 
> >> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> >> 
> >> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
> >> >> The current state of the TIS TPM is that interrupts have been globally
> >> >> disabled by various changes.  The problems we got reported the last
> >> >> time they were enabled was interrupt storms.  With my own TIS TPM,
> >> >> I've found that this is caused because my TPM doesn't do legacy
> >> >> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> >> >> requires any TIS TPM without legacy cycles not to act on any write to
> >> >> an interrupt register unless the locality is enabled.  This means if
> >> >> an interrupt fires after we relinquish the locality, the TPM_EOI in
> >> >> the interrupt routine is ineffective meaning the same interrupt
> >> >> triggers over and over again.  This problem also means we can have
> >> >> trouble setting up interrupts on TIS TPMs because the current init
> >> >> code does the setup before the locality is claimed for the first time.
> >> >> 
> >> >> James
> >> >
> >> > You should consider expanding the audience. Jerry, once you have some
> >> > bandwidth (no rush, does not land before rc2), it would be great that if
> >> > you could try this. I'm emphasizing this just because of the
> >> > intersection. I think it would also make senset to get tested-by from
> >> > Nayna.
> >> 
> >> I will run some tests on some other systems I have access to. As noted
> >> in the other email I did a quick test with a t490s with an older bios
> >> that exhibits the problem originally reported when Stefan's patch
> >> enabled interrupts.
> >
> > Thank you. As said, I can make a pull request to rc2 or even rc3, if
> > needed.
> >
> > /Jarkko
> 
> So outside of the t490s I have access to, it looks like the nuc5 with
> tpm2.0 device, and and older lenovo D30 with a tpm1.2 device both are
> not using interrupts. I'm digging around to see if I can find some
> other systems that I can test interrupts on.
> 
> Regards,
> Jerry

I'm going to test with this C720P chromebook, which has a long standing
bug in the kernel bugzilla. I got it a while ago but it's stuck in the
boot.

If that doesn't boot, I'll pick up old Ivylake NUC from the office,
which has TPM 1.2 and bit newer TPM 2.0 NUC. Should anyway pick them,
have not used them for testing for a while because of pandemia.

/Jarkko
Jerry Snitselaar Oct. 26, 2020, 6:29 p.m. UTC | #26
Jarkko Sakkinen @ 2020-10-24 05:20 MST:

> On Tue, Oct 20, 2020 at 04:10:42PM -0700, Jerry Snitselaar wrote:
>> 
>> Jarkko Sakkinen @ 2020-10-18 14:05 MST:
>> 
>> > On Tue, Oct 13, 2020 at 08:15:36AM -0700, Jerry Snitselaar wrote:
>> >> 
>> >> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
>> >> 
>> >> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
>> >> >> The current state of the TIS TPM is that interrupts have been globally
>> >> >> disabled by various changes.  The problems we got reported the last
>> >> >> time they were enabled was interrupt storms.  With my own TIS TPM,
>> >> >> I've found that this is caused because my TPM doesn't do legacy
>> >> >> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
>> >> >> requires any TIS TPM without legacy cycles not to act on any write to
>> >> >> an interrupt register unless the locality is enabled.  This means if
>> >> >> an interrupt fires after we relinquish the locality, the TPM_EOI in
>> >> >> the interrupt routine is ineffective meaning the same interrupt
>> >> >> triggers over and over again.  This problem also means we can have
>> >> >> trouble setting up interrupts on TIS TPMs because the current init
>> >> >> code does the setup before the locality is claimed for the first time.
>> >> >> 
>> >> >> James
>> >> >
>> >> > You should consider expanding the audience. Jerry, once you have some
>> >> > bandwidth (no rush, does not land before rc2), it would be great that if
>> >> > you could try this. I'm emphasizing this just because of the
>> >> > intersection. I think it would also make senset to get tested-by from
>> >> > Nayna.
>> >> 
>> >> I will run some tests on some other systems I have access to. As noted
>> >> in the other email I did a quick test with a t490s with an older bios
>> >> that exhibits the problem originally reported when Stefan's patch
>> >> enabled interrupts.
>> >
>> > Thank you. As said, I can make a pull request to rc2 or even rc3, if
>> > needed.
>> >
>> > /Jarkko
>> 
>> So outside of the t490s I have access to, it looks like the nuc5 with
>> tpm2.0 device, and and older lenovo D30 with a tpm1.2 device both are
>> not using interrupts. I'm digging around to see if I can find some
>> other systems that I can test interrupts on.
>> 
>> Regards,
>> Jerry
>
> I'm going to test with this C720P chromebook, which has a long standing
> bug in the kernel bugzilla. I got it a while ago but it's stuck in the
> boot.
>
> If that doesn't boot, I'll pick up old Ivylake NUC from the office,
> which has TPM 1.2 and bit newer TPM 2.0 NUC. Should anyway pick them,
> have not used them for testing for a while because of pandemia.
>
> /Jarkko

My search continues through the systems in our labs. Running into the
same issue with the Dells. Looking at the TCPA table it looks like
they aren't set up for interrupts as well. I have some tpm2.0 systems
to still try.

Jerry
Jarkko Sakkinen Oct. 27, 2020, 5:14 p.m. UTC | #27
On Mon, Oct 26, 2020 at 11:29:46AM -0700, Jerry Snitselaar wrote:
> 
> Jarkko Sakkinen @ 2020-10-24 05:20 MST:
> 
> > On Tue, Oct 20, 2020 at 04:10:42PM -0700, Jerry Snitselaar wrote:
> >> 
> >> Jarkko Sakkinen @ 2020-10-18 14:05 MST:
> >> 
> >> > On Tue, Oct 13, 2020 at 08:15:36AM -0700, Jerry Snitselaar wrote:
> >> >> 
> >> >> Jarkko Sakkinen @ 2020-10-12 18:17 MST:
> >> >> 
> >> >> > On Thu, Oct 01, 2020 at 11:09:20AM -0700, James Bottomley wrote:
> >> >> >> The current state of the TIS TPM is that interrupts have been globally
> >> >> >> disabled by various changes.  The problems we got reported the last
> >> >> >> time they were enabled was interrupt storms.  With my own TIS TPM,
> >> >> >> I've found that this is caused because my TPM doesn't do legacy
> >> >> >> cycles, The TIS spec (chapter 6.1 "Locality Usage Per Register")
> >> >> >> requires any TIS TPM without legacy cycles not to act on any write to
> >> >> >> an interrupt register unless the locality is enabled.  This means if
> >> >> >> an interrupt fires after we relinquish the locality, the TPM_EOI in
> >> >> >> the interrupt routine is ineffective meaning the same interrupt
> >> >> >> triggers over and over again.  This problem also means we can have
> >> >> >> trouble setting up interrupts on TIS TPMs because the current init
> >> >> >> code does the setup before the locality is claimed for the first time.
> >> >> >> 
> >> >> >> James
> >> >> >
> >> >> > You should consider expanding the audience. Jerry, once you have some
> >> >> > bandwidth (no rush, does not land before rc2), it would be great that if
> >> >> > you could try this. I'm emphasizing this just because of the
> >> >> > intersection. I think it would also make senset to get tested-by from
> >> >> > Nayna.
> >> >> 
> >> >> I will run some tests on some other systems I have access to. As noted
> >> >> in the other email I did a quick test with a t490s with an older bios
> >> >> that exhibits the problem originally reported when Stefan's patch
> >> >> enabled interrupts.
> >> >
> >> > Thank you. As said, I can make a pull request to rc2 or even rc3, if
> >> > needed.
> >> >
> >> > /Jarkko
> >> 
> >> So outside of the t490s I have access to, it looks like the nuc5 with
> >> tpm2.0 device, and and older lenovo D30 with a tpm1.2 device both are
> >> not using interrupts. I'm digging around to see if I can find some
> >> other systems that I can test interrupts on.
> >> 
> >> Regards,
> >> Jerry
> >
> > I'm going to test with this C720P chromebook, which has a long standing
> > bug in the kernel bugzilla. I got it a while ago but it's stuck in the
> > boot.
> >
> > If that doesn't boot, I'll pick up old Ivylake NUC from the office,
> > which has TPM 1.2 and bit newer TPM 2.0 NUC. Should anyway pick them,
> > have not used them for testing for a while because of pandemia.
> >
> > /Jarkko
> 
> My search continues through the systems in our labs. Running into the
> same issue with the Dells. Looking at the TCPA table it looks like
> they aren't set up for interrupts as well. I have some tpm2.0 systems
> to still try.

The chromebook that I have feels like bricked, at least for the window
for testing this bug. I'll pick up the nucs tomorrow.

> Jerry

/Jarkko