diff mbox

[BISECTED] tpm CLKRUN breaks PS/2 keyboard and touchpad on Braswell system

Message ID 16609e73-e35d-4bb0-410d-e87915daba39@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Javier Martinez Canillas Dec. 18, 2017, 12:29 p.m. UTC
On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:

[snip]

> 
> James,
> 
> Can you please test the following (untested) patch on top of the other two
> mentioned patches to see if it makes a difference for you?
> 

I should had tried to at least compile the patch :)

Updated patch below:

From 370d45a34dc8914066a995a3a6d6df1953ea9f60 Mon Sep 17 00:00:00 2001
From: Javier Martinez Canillas <javierm@redhat.com>
Date: Mon, 18 Dec 2017 12:56:28 +0100
Subject: [PATCH v2] tpm: only attempt to disable the LPC CLKRUN if is already
 enabled

Commit 5e572cab92f0 ("tpm: Enable CLKRUN protocol for Braswell systems")
added logic in the TPM TIS driver to disable the Low Pin Count CLKRUN
signal during TPM transactions.

Unfortunately this breaks other devices that are attached to the LPC bus
like for example PS/2 mouse and keyboards.

One flaw with the logic is that it assumes that the CLKRUN is always
enabled, and so it unconditionally enables it after a TPM transaction.

But it could be that the CLKRUN signal was already disabled in the LPC
bus and so after the driver probes, the signal will remain enabled which
may break other devices transactions since the clocks will be restarted
by the CLKRUN# signal.

Fixes: 5e572cab92f0 ("tpm: Enable CLKRUN protocol for Braswell systems")
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---
 drivers/char/tpm/tpm_tis_core.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Dec. 18, 2017, 5:55 p.m. UTC | #1
On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:
> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:
> 
> [snip]
> 
> > 
> > James,
> > 
> > Can you please test the following (untested) patch on top of the other two
> > mentioned patches to see if it makes a difference for you?
> > 
> 
> I should had tried to at least compile the patch :)

I think this is backwards..

If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then
TPM shouldn't do anything at all.

If CLKRUN_EN is off, then it should try to turn it on/off to save
power.

Perhaps the best work around is to just delete the turning off of
CLKRUN_EN ? Uses more power but keeps the clock running which should
keep both TPM and superio happy.

Jason
James Ettle Dec. 18, 2017, 6:26 p.m. UTC | #2
The keyboard and touchpad work OK with the patch quoted below and the earlier two applied, i.e. the three patches with signatures:

667dcc75be864ff4c17cf58891853b7393bba3e2
db3248e8a036c39141c8f7e9f1cf5c5ae6815f76
370d45a34dc8914066a995a3a6d6df1953ea9f60

I applied these to a vanilla kernel.org 4.14.7 source using Fedora 4.14.5-300.fc27 .config. Confirmed the tpm modules are loaded.

Tests:
1. Keyboard and touchpad work OK on boot - PASS
2. Still work after suspend/resume - PASS

Let me know if you want any further tests.

Many thanks,
James.


On 18/12/17 12:29, Javier Martinez Canillas wrote:
> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:
> 
> [snip]
> 
>>
>> James,
>>
>> Can you please test the following (untested) patch on top of the other two
>> mentioned patches to see if it makes a difference for you?
>>
> 
> I should had tried to at least compile the patch :)
> 
> Updated patch below:
> 
> From 370d45a34dc8914066a995a3a6d6df1953ea9f60 Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas <javierm@redhat.com>
> Date: Mon, 18 Dec 2017 12:56:28 +0100
> Subject: [PATCH v2] tpm: only attempt to disable the LPC CLKRUN if is already
>  enabled
> 
> Commit 5e572cab92f0 ("tpm: Enable CLKRUN protocol for Braswell systems")
> added logic in the TPM TIS driver to disable the Low Pin Count CLKRUN
> signal during TPM transactions.
> 
> Unfortunately this breaks other devices that are attached to the LPC bus
> like for example PS/2 mouse and keyboards.
> 
> One flaw with the logic is that it assumes that the CLKRUN is always
> enabled, and so it unconditionally enables it after a TPM transaction.
> 
> But it could be that the CLKRUN signal was already disabled in the LPC
> bus and so after the driver probes, the signal will remain enabled which
> may break other devices transactions since the clocks will be restarted
> by the CLKRUN# signal.
> 
> Fixes: 5e572cab92f0 ("tpm: Enable CLKRUN protocol for Braswell systems")
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> ---
>  drivers/char/tpm/tpm_tis_core.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
> index e7bd2e750f69..5f2b1fc2194f 100644
> --- a/drivers/char/tpm/tpm_tis_core.c
> +++ b/drivers/char/tpm/tpm_tis_core.c
> @@ -688,7 +688,8 @@ static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value)
>  	struct tpm_tis_data *data = dev_get_drvdata(&chip->dev);
>  	u32 clkrun_val;
>  
> -	if (!IS_ENABLED(CONFIG_X86) || !is_bsw())
> +	if (!IS_ENABLED(CONFIG_X86) || !is_bsw() ||
> +	    !data->ilb_base_addr)
>  		return;
>  
>  	if (value) {
> @@ -746,7 +747,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  		      const struct tpm_tis_phy_ops *phy_ops,
>  		      acpi_handle acpi_dev_handle)
>  {
> -	u32 vendor, intfcaps, intmask;
> +	u32 vendor, intfcaps, intmask, clkrun_val;
>  	u8 rid;
>  	int rc, probe;
>  	struct tpm_chip *chip;
> @@ -772,6 +773,14 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  					ILB_REMAP_SIZE);
>  		if (!priv->ilb_base_addr)
>  			return -ENOMEM;
> +
> +		clkrun_val = ioread32(priv->ilb_base_addr + LPC_CNTRL_OFFSET);
> +		/* Check if CLKRUN# is already not enabled in the LPC bus */
> +		if (!(clkrun_val & LPC_CLKRUN_EN)) {
> +			priv->flags |= TPM_TIS_CLK_ENABLE;
> +			iounmap(priv->ilb_base_addr);
> +			priv->ilb_base_addr = NULL;
> +		}
>  	}
>  
>  	if (chip->ops->clk_enable != NULL)
> @@ -868,7 +877,7 @@ int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
>  	}
>  
>  	rc = tpm_chip_register(chip);
> -	if (rc && is_bsw())
> +	if (rc && is_bsw() && priv->ilb_base_addr)
>  		iounmap(priv->ilb_base_addr);
>  
>  	if (chip->ops->clk_enable != NULL)
>
Javier Martinez Canillas Dec. 18, 2017, 7:29 p.m. UTC | #3
Hello Jason,

Thanks a lot for your feedback.

On 12/18/2017 06:55 PM, Jason Gunthorpe wrote:
> On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:
>> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:
>>
>> [snip]
>>
>>>
>>> James,
>>>
>>> Can you please test the following (untested) patch on top of the other two
>>> mentioned patches to see if it makes a difference for you?
>>>
>>
>> I should had tried to at least compile the patch :)
> 
> I think this is backwards..
> 
> If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then
> TPM shouldn't do anything at all.
> 
> If CLKRUN_EN is off, then it should try to turn it on/off to save
> power.
>

My knowledge of LPC is near to non-existent so I please forgive me if I'm wrong,
but my understanding is that it's the opposite of what you said.

When CLKRUN_EN is SET, power management is ENABLED on the LPC bus and the host
LCLK clock may be stopped when entering in some low-power states.

The CLKRUN# signal is then used by peripherals to restart the LCLK clock after
resuming from low-power states to be able to transmit again.

When CLKRUN_EN is CLEAR, power management is DISABLED on the LPC bus and the
host LCLK clock is never stopped even when entering in some low-power states.

IIUC, if CLKRUN_EN is enabled, then all the devices attached to the LPC bus
have to support the CLKRUN protocol. My guess is that on some Braswell systems
LPC power management is enabled but the TPM device doesn't have CLKRUN support.

And that was the motivation of commit 5e572cab92f0 ("tpm: Enable CLKRUN protocol
for Braswell systems") that introduced the regression.

> Perhaps the best work around is to just delete the turning off of
> CLKRUN_EN ? Uses more power but keeps the clock running which should
> keep both TPM and superio happy.
>

But that's exactly the goal of the commit 5e572cab92f0, to disable the CLKRUN
protocol (probably for what I mentioned before). So doing so will cause issues
for those systems again.

What the patch I shared with James does is to avoid disabling the CLKRUN_EN
if this is already disabled. Which should be a no-op anyways but the problem
is that latter it's enabled even when the driver found disabled to start with.

I still don't like the overall solution since it's too error prone. Touching a
global bus configuration in a driver for a single peripheral isn't safe to say
the least.

But regardless of that, the patch I shared has merits on its own since is wrong
to keep the bus configuration in a different state that was before the driver
was probed. And in fact, James reported that it makes his system to work again.

> Jason
>

Best regards,
Javier Martinez Canillas Dec. 18, 2017, 7:34 p.m. UTC | #4
Hello James,

On 12/18/2017 07:26 PM, James Ettle wrote:
> The keyboard and touchpad work OK with the patch quoted below and the earlier two applied, i.e. the three patches with signatures:
> 
> 667dcc75be864ff4c17cf58891853b7393bba3e2
> db3248e8a036c39141c8f7e9f1cf5c5ae6815f76
> 370d45a34dc8914066a995a3a6d6df1953ea9f60
> 
> I applied these to a vanilla kernel.org 4.14.7 source using Fedora 4.14.5-300.fc27 .config. Confirmed the tpm modules are loaded.
> 
> Tests:
> 1. Keyboard and touchpad work OK on boot - PASS
> 2. Still work after suspend/resume - PASS
> 
> Let me know if you want any further tests.
>

Great, thanks a lot for testing!
 
> Many thanks,
> James.
>
Best regards,
Azhar Shaikh Dec. 18, 2017, 7:34 p.m. UTC | #5
>-----Original Message-----

>From: Javier Martinez Canillas [mailto:javierm@redhat.com]

>Sent: Monday, December 18, 2017 11:30 AM

>To: Jason Gunthorpe <jgg@ziepe.ca>

>Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>; James Ettle

><james@ettle.org.uk>; linux-integrity@vger.kernel.org; Shaikh, Azhar

><azhar.shaikh@intel.com>; linux-kernel@vger.kernel.org;

>james.l.morris@oracle.com

>Subject: Re: [BISECTED] tpm CLKRUN breaks PS/2 keyboard and touchpad on

>Braswell system

>

>Hello Jason,

>

>Thanks a lot for your feedback.

>

>On 12/18/2017 06:55 PM, Jason Gunthorpe wrote:

>> On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:

>>> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:

>>>

>>> [snip]

>>>

>>>>

>>>> James,

>>>>

>>>> Can you please test the following (untested) patch on top of the

>>>> other two mentioned patches to see if it makes a difference for you?

>>>>

>>>

>>> I should had tried to at least compile the patch :)

>>

>> I think this is backwards..

>>

>> If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then

>> TPM shouldn't do anything at all.

>>

>> If CLKRUN_EN is off, then it should try to turn it on/off to save

>> power.

>>

>

>My knowledge of LPC is near to non-existent so I please forgive me if I'm

>wrong, but my understanding is that it's the opposite of what you said.

>

>When CLKRUN_EN is SET, power management is ENABLED on the LPC bus and

>the host LCLK clock may be stopped when entering in some low-power states.

>

>The CLKRUN# signal is then used by peripherals to restart the LCLK clock after

>resuming from low-power states to be able to transmit again.

>

>When CLKRUN_EN is CLEAR, power management is DISABLED on the LPC bus

>and the host LCLK clock is never stopped even when entering in some low-

>power states.

>


Hi Javier,

Yes you are correct with the above understanding of the CLKRUN.

>IIUC, if CLKRUN_EN is enabled, then all the devices attached to the LPC bus

>have to support the CLKRUN protocol. My guess is that on some Braswell

>systems LPC power management is enabled but the TPM device doesn't have

>CLKRUN support.

>


I think this is what might be happening here.

Since I do not have an end product to test this on, I managed to get one BSW Reference Verification Platform(RVP). I flashed the latest Ubuntu (17.10) which is on 4.13.13 kernel version and does have the patch.

I was able to use the PS/2 mouse and keyboard immediately after bootup and also after suspend/resume cycle. The system does have a TPM.(/dev/tpm0 and /dev/tpmrm0 exist)


>And that was the motivation of commit 5e572cab92f0 ("tpm: Enable CLKRUN

>protocol for Braswell systems") that introduced the regression.

>

>> Perhaps the best work around is to just delete the turning off of

>> CLKRUN_EN ? Uses more power but keeps the clock running which should

>> keep both TPM and superio happy.

>>

>

>But that's exactly the goal of the commit 5e572cab92f0, to disable the CLKRUN

>protocol (probably for what I mentioned before). So doing so will cause issues

>for those systems again.

>

>What the patch I shared with James does is to avoid disabling the CLKRUN_EN

>if this is already disabled. Which should be a no-op anyways but the problem

>is that latter it's enabled even when the driver found disabled to start with.

>

>I still don't like the overall solution since it's too error prone. Touching a global

>bus configuration in a driver for a single peripheral isn't safe to say the least.

>

>But regardless of that, the patch I shared has merits on its own since is wrong

>to keep the bus configuration in a different state that was before the driver

>was probed. And in fact, James reported that it makes his system to work

>again.

>

>> Jason

>>

>

>Best regards,

>--

>Javier Martinez Canillas

>Software Engineer - Desktop Hardware Enablement Red Hat
Azhar Shaikh Dec. 18, 2017, 8:04 p.m. UTC | #6
>-----Original Message-----

>From: linux-integrity-owner@vger.kernel.org [mailto:linux-integrity-

>owner@vger.kernel.org] On Behalf Of Shaikh, Azhar

>Sent: Monday, December 18, 2017 11:34 AM

>To: Javier Martinez Canillas <javierm@redhat.com>; Jason Gunthorpe

><jgg@ziepe.ca>

>Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>; James Ettle

><james@ettle.org.uk>; linux-integrity@vger.kernel.org; linux-

>kernel@vger.kernel.org; james.l.morris@oracle.com

>Subject: RE: [BISECTED] tpm CLKRUN breaks PS/2 keyboard and touchpad on

>Braswell system

>

>

>

>>-----Original Message-----

>>From: Javier Martinez Canillas [mailto:javierm@redhat.com]

>>Sent: Monday, December 18, 2017 11:30 AM

>>To: Jason Gunthorpe <jgg@ziepe.ca>

>>Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>; James Ettle

>><james@ettle.org.uk>; linux-integrity@vger.kernel.org; Shaikh, Azhar

>><azhar.shaikh@intel.com>; linux-kernel@vger.kernel.org;

>>james.l.morris@oracle.com

>>Subject: Re: [BISECTED] tpm CLKRUN breaks PS/2 keyboard and touchpad on

>>Braswell system

>>

>>Hello Jason,

>>

>>Thanks a lot for your feedback.

>>

>>On 12/18/2017 06:55 PM, Jason Gunthorpe wrote:

>>> On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:

>>>> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:

>>>>

>>>> [snip]

>>>>

>>>>>

>>>>> James,

>>>>>

>>>>> Can you please test the following (untested) patch on top of the

>>>>> other two mentioned patches to see if it makes a difference for you?

>>>>>

>>>>

>>>> I should had tried to at least compile the patch :)

>>>

>>> I think this is backwards..

>>>

>>> If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then

>>> TPM shouldn't do anything at all.

>>>

>>> If CLKRUN_EN is off, then it should try to turn it on/off to save

>>> power.

>>>

>>

>>My knowledge of LPC is near to non-existent so I please forgive me if

>>I'm wrong, but my understanding is that it's the opposite of what you said.

>>

>>When CLKRUN_EN is SET, power management is ENABLED on the LPC bus

>and

>>the host LCLK clock may be stopped when entering in some low-power

>states.

>>

>>The CLKRUN# signal is then used by peripherals to restart the LCLK

>>clock after resuming from low-power states to be able to transmit again.

>>

>>When CLKRUN_EN is CLEAR, power management is DISABLED on the LPC bus

>>and the host LCLK clock is never stopped even when entering in some

>>low- power states.

>>

>

>Hi Javier,

>

>Yes you are correct with the above understanding of the CLKRUN.

>

>>IIUC, if CLKRUN_EN is enabled, then all the devices attached to the LPC

>>bus have to support the CLKRUN protocol. My guess is that on some

>>Braswell systems LPC power management is enabled but the TPM device

>>doesn't have CLKRUN support.

>>

>

>I think this is what might be happening here.

>

>Since I do not have an end product to test this on, I managed to get one BSW

>Reference Verification Platform(RVP). I flashed the latest Ubuntu (17.10)

>which is on 4.13.13 kernel version and does have the patch.

>


The correct kernel version is 4.13.0-19

>I was able to use the PS/2 mouse and keyboard immediately after bootup and

>also after suspend/resume cycle. The system does have a TPM.(/dev/tpm0

>and /dev/tpmrm0 exist)

>

>


The RVP has a Nuvoton TPM

>>And that was the motivation of commit 5e572cab92f0 ("tpm: Enable CLKRUN

>>protocol for Braswell systems") that introduced the regression.

>>

>>> Perhaps the best work around is to just delete the turning off of

>>> CLKRUN_EN ? Uses more power but keeps the clock running which should

>>> keep both TPM and superio happy.

>>>

>>

>>But that's exactly the goal of the commit 5e572cab92f0, to disable the

>>CLKRUN protocol (probably for what I mentioned before). So doing so

>>will cause issues for those systems again.

>>

>>What the patch I shared with James does is to avoid disabling the

>>CLKRUN_EN if this is already disabled. Which should be a no-op anyways

>>but the problem is that latter it's enabled even when the driver found

>disabled to start with.

>>

>>I still don't like the overall solution since it's too error prone.

>>Touching a global bus configuration in a driver for a single peripheral isn't

>safe to say the least.

>>

>>But regardless of that, the patch I shared has merits on its own since

>>is wrong to keep the bus configuration in a different state that was

>>before the driver was probed. And in fact, James reported that it makes

>>his system to work again.

>>

>>> Jason

>>>

>>

>>Best regards,

>>--

>>Javier Martinez Canillas

>>Software Engineer - Desktop Hardware Enablement Red Hat
Jason Gunthorpe Dec. 18, 2017, 8:17 p.m. UTC | #7
On Mon, Dec 18, 2017 at 08:29:30PM +0100, Javier Martinez Canillas wrote:

> My knowledge of LPC is near to non-existent so I please forgive me if I'm wrong,
> but my understanding is that it's the opposite of what you said.
> 
> When CLKRUN_EN is SET, power management is ENABLED on the LPC bus and the host
> LCLK clock may be stopped when entering in some low-power states.
>
> The CLKRUN# signal is then used by peripherals to restart the LCLK clock after
> resuming from low-power states to be able to transmit again.
> 
> When CLKRUN_EN is CLEAR, power management is DISABLED on the LPC bus and the
> host LCLK clock is never stopped even when entering in some low-power states.

Okay, that makes sense.

Jason
Jason Gunthorpe Dec. 18, 2017, 8:19 p.m. UTC | #8
On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:

> >IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
> >LPC bus have to support the CLKRUN protocol. My guess is that on
> >some Braswell systems LPC power management is enabled but the TPM
> >device doesn't have CLKRUN support.
> 
> I think this is what might be happening here.

That makes it a BIOS bug, not a chipset bug, and we shouldn't be
trying to fix it like this in Linux.

Based on the original discussion I always thought this was an Intel
chipset bug and applies to all cases.

Jason
Javier Martinez Canillas Dec. 18, 2017, 11:06 p.m. UTC | #9
Hello Azhar,

On 12/18/2017 08:34 PM, Shaikh, Azhar wrote:
> 
> 
>> -----Original Message-----
>> From: Javier Martinez Canillas [mailto:javierm@redhat.com]
>> Sent: Monday, December 18, 2017 11:30 AM
>> To: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>; James Ettle
>> <james@ettle.org.uk>; linux-integrity@vger.kernel.org; Shaikh, Azhar
>> <azhar.shaikh@intel.com>; linux-kernel@vger.kernel.org;
>> james.l.morris@oracle.com
>> Subject: Re: [BISECTED] tpm CLKRUN breaks PS/2 keyboard and touchpad on
>> Braswell system
>>
>> Hello Jason,
>>
>> Thanks a lot for your feedback.
>>
>> On 12/18/2017 06:55 PM, Jason Gunthorpe wrote:
>>> On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:
>>>> On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:
>>>>
>>>> [snip]
>>>>
>>>>>
>>>>> James,
>>>>>
>>>>> Can you please test the following (untested) patch on top of the
>>>>> other two mentioned patches to see if it makes a difference for you?
>>>>>
>>>>
>>>> I should had tried to at least compile the patch :)
>>>
>>> I think this is backwards..
>>>
>>> If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then
>>> TPM shouldn't do anything at all.
>>>
>>> If CLKRUN_EN is off, then it should try to turn it on/off to save
>>> power.
>>>
>>
>> My knowledge of LPC is near to non-existent so I please forgive me if I'm
>> wrong, but my understanding is that it's the opposite of what you said.
>>
>> When CLKRUN_EN is SET, power management is ENABLED on the LPC bus and
>> the host LCLK clock may be stopped when entering in some low-power states.
>>
>> The CLKRUN# signal is then used by peripherals to restart the LCLK clock after
>> resuming from low-power states to be able to transmit again.
>>
>> When CLKRUN_EN is CLEAR, power management is DISABLED on the LPC bus
>> and the host LCLK clock is never stopped even when entering in some low-
>> power states.
>>
> 
> Hi Javier,
> 
> Yes you are correct with the above understanding of the CLKRUN.
>

Thanks for the confirmation.
 
>> IIUC, if CLKRUN_EN is enabled, then all the devices attached to the LPC bus
>> have to support the CLKRUN protocol. My guess is that on some Braswell
>> systems LPC power management is enabled but the TPM device doesn't have
>> CLKRUN support.
>>
> 
> I think this is what might be happening here.
>

One question, what happens if the CLKRUN protocol is disabled and never enabled
again? My understanding is that the CLKRUN# signal is not required if the host
never stops the LCLK clock, and peripherals that do support the CLKRUN protocol
should still work if the CLKRUN protocol is disabled and the LCLK isn't stopped.

So from what we discussed above, I think it may be correct to completely disable
CLKRUN protocol if there's a peripheral that doesn't support it.
 
> Since I do not have an end product to test this on, I managed to get one BSW Reference Verification Platform(RVP). I flashed the latest Ubuntu (17.10) which is on 4.13.13 kernel version and does have the patch.
> 
> I was able to use the PS/2 mouse and keyboard immediately after bootup and also after suspend/resume cycle. The system does have a TPM.(/dev/tpm0 and /dev/tpmrm0 exist)
>

It's not clear to me if the RVP system is the same one where you found the issue
and lead to the commit that caused the regression.

Could you please test what happens if you disable the CLKRUN_EN and never enable
it again?

Best regards,
Javier Martinez Canillas Dec. 18, 2017, 11:34 p.m. UTC | #10
Hello Jason,

On 12/18/2017 09:19 PM, Jason Gunthorpe wrote:
> On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:
> 
>>> IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
>>> LPC bus have to support the CLKRUN protocol. My guess is that on
>>> some Braswell systems LPC power management is enabled but the TPM
>>> device doesn't have CLKRUN support.
>>
>> I think this is what might be happening here.
> 
> That makes it a BIOS bug, not a chipset bug, and we shouldn't be
> trying to fix it like this in Linux.
>

Indeed, the system integrator should make sure that all peripherals that
are connected through the LPC bus either support the CLKRUN protocol and
CLKRUN_EN is enabled or CLKRUN_EN should be disabled.
 
> Based on the original discussion I always thought this was an Intel
> chipset bug and applies to all cases.
>

After thinking about this and with a better understanding of the issue,
I think we have 2 options (please let me know if I got something wrong):

1) Leave the code as is and apply the patch I shared with James. In that
   case the CLKRUN protocol will be disabled only during TPM transactions
   and not enabled again after transactions if it wasn't enabled.

   This shouldn't affect other peripherals since even if they have CLKRUN
   support, they should work correctly while CLKRUN protocol is disabled.

   The disadvantage is that TPM devices that have CLKRUN support (do they
   exist?) will not take the advantage of the power management feature of
   stopping the LPC host LCLK clock during low-power states.

2) Drop the pending CLKRUN patch in linux-tpmdd and revert CLKRUN commit
   in mainline. And instead of disabling the CLKRUN protocol during the
   TPM transactions, disable if the CLKRUN_EN is enabled and the system
   is in a list of systems that have a TPM that doesn't support CLKRUN.

   This list could be for example a struct dmi_system_id array and match
   using DMI data on module_init().

   The advantage is that TPM devices with CLKRUN protocol support could
   make use of the CLKRUN power management feature and only systems with
   a TPM that doesn't support the CLKRUN protocol will disable it.

   The disadvantage is that the struct dmi_system_id array to match will
   have to be maintained and every known-to-be-broken system added to it.

Thoughts?

> Jason
> 

Best regards,
Jason Gunthorpe Dec. 19, 2017, 2:08 a.m. UTC | #11
On Tue, Dec 19, 2017 at 12:34:46AM +0100, Javier Martinez Canillas wrote:
> Hello Jason,
> 
> On 12/18/2017 09:19 PM, Jason Gunthorpe wrote:
> > On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:
> > 
> >>> IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
> >>> LPC bus have to support the CLKRUN protocol. My guess is that on
> >>> some Braswell systems LPC power management is enabled but the TPM
> >>> device doesn't have CLKRUN support.
> >>
> >> I think this is what might be happening here.
> > 
> > That makes it a BIOS bug, not a chipset bug, and we shouldn't be
> > trying to fix it like this in Linux.
> >
> 
> Indeed, the system integrator should make sure that all peripherals that
> are connected through the LPC bus either support the CLKRUN protocol and
> CLKRUN_EN is enabled or CLKRUN_EN should be disabled.

I would like to hear from Intel why they made this patch in the first
place, since we are mostly guessing...

> 1) Leave the code as is and apply the patch I shared with James. In that
>    case the CLKRUN protocol will be disabled only during TPM transactions
>    and not enabled again after transactions if it wasn't enabled.

This seems necessary no matter what, especially if it works..

> 2) Drop the pending CLKRUN patch in linux-tpmdd and revert CLKRUN commit
>    in mainline. And instead of disabling the CLKRUN protocol during the
>    TPM transactions, disable if the CLKRUN_EN is enabled and the system
>    is in a list of systems that have a TPM that doesn't support CLKRUN.

This is also a good possible idea.

Again, depends why Intel did this. I assumed it was a chipset bug
since Intel was sending it not a system builder..

Jason
Jarkko Sakkinen Dec. 19, 2017, 1 p.m. UTC | #12
On Mon, Dec 18, 2017 at 06:26:35PM +0000, James Ettle wrote:
> The keyboard and touchpad work OK with the patch quoted below and the earlier two applied, i.e. the three patches with signatures:
> 
> 667dcc75be864ff4c17cf58891853b7393bba3e2
> db3248e8a036c39141c8f7e9f1cf5c5ae6815f76
> 370d45a34dc8914066a995a3a6d6df1953ea9f60
> 
> I applied these to a vanilla kernel.org 4.14.7 source using Fedora 4.14.5-300.fc27 .config. Confirmed the tpm modules are loaded.
> 
> Tests:
> 1. Keyboard and touchpad work OK on boot - PASS
> 2. Still work after suspend/resume - PASS
> 
> Let me know if you want any further tests.
> 
> Many thanks,
> James.

Thank you for your awesome help! I guess Javier can add your
Tested-by to the patch?

/Jarkko
Jarkko Sakkinen Dec. 19, 2017, 1:02 p.m. UTC | #13
On Mon, Dec 18, 2017 at 10:55:02AM -0700, Jason Gunthorpe wrote:
> On Mon, Dec 18, 2017 at 01:29:01PM +0100, Javier Martinez Canillas wrote:
> > On 12/18/2017 01:22 PM, Javier Martinez Canillas wrote:
> > 
> > [snip]
> > 
> > > 
> > > James,
> > > 
> > > Can you please test the following (untested) patch on top of the other two
> > > mentioned patches to see if it makes a difference for you?
> > > 
> > 
> > I should had tried to at least compile the patch :)
> 
> I think this is backwards..
> 
> If CLKRUN_EN is on (eg power management is NOT enabled on LPC) then
> TPM shouldn't do anything at all.
> 
> If CLKRUN_EN is off, then it should try to turn it on/off to save
> power.
> 
> Perhaps the best work around is to just delete the turning off of
> CLKRUN_EN ? Uses more power but keeps the clock running which should
> keep both TPM and superio happy.
> 
> Jason

On some BSW systems keeping CLKRUN_EN on during TPM command caused
issues. That was the original reason for the fixes.

/Jarkko
Jarkko Sakkinen Dec. 19, 2017, 1:04 p.m. UTC | #14
On Mon, Dec 18, 2017 at 01:19:02PM -0700, Jason Gunthorpe wrote:
> On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:
> 
> > >IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
> > >LPC bus have to support the CLKRUN protocol. My guess is that on
> > >some Braswell systems LPC power management is enabled but the TPM
> > >device doesn't have CLKRUN support.
> > 
> > I think this is what might be happening here.
> 
> That makes it a BIOS bug, not a chipset bug, and we shouldn't be
> trying to fix it like this in Linux.
> 
> Based on the original discussion I always thought this was an Intel
> chipset bug and applies to all cases.
> 
> Jason

It would not be first time when some BIOS issues are circumvented in the
kernel.

/Jarkko
Jarkko Sakkinen Dec. 19, 2017, 1:10 p.m. UTC | #15
On Tue, Dec 19, 2017 at 12:34:46AM +0100, Javier Martinez Canillas wrote:
> Hello Jason,
> 
> On 12/18/2017 09:19 PM, Jason Gunthorpe wrote:
> > On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:
> > 
> >>> IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
> >>> LPC bus have to support the CLKRUN protocol. My guess is that on
> >>> some Braswell systems LPC power management is enabled but the TPM
> >>> device doesn't have CLKRUN support.
> >>
> >> I think this is what might be happening here.
> > 
> > That makes it a BIOS bug, not a chipset bug, and we shouldn't be
> > trying to fix it like this in Linux.
> >
> 
> Indeed, the system integrator should make sure that all peripherals that
> are connected through the LPC bus either support the CLKRUN protocol and
> CLKRUN_EN is enabled or CLKRUN_EN should be disabled.
>  
> > Based on the original discussion I always thought this was an Intel
> > chipset bug and applies to all cases.
> >
> 
> After thinking about this and with a better understanding of the issue,
> I think we have 2 options (please let me know if I got something wrong):
> 
> 1) Leave the code as is and apply the patch I shared with James. In that
>    case the CLKRUN protocol will be disabled only during TPM transactions
>    and not enabled again after transactions if it wasn't enabled.
> 
>    This shouldn't affect other peripherals since even if they have CLKRUN
>    support, they should work correctly while CLKRUN protocol is disabled.
> 
>    The disadvantage is that TPM devices that have CLKRUN support (do they
>    exist?) will not take the advantage of the power management feature of
>    stopping the LPC host LCLK clock during low-power states.

CLKRUN is enabled after TPM has processed the command so how this could
make power mgmt worse?

/Jarkko
Javier Martinez Canillas Dec. 19, 2017, 1:12 p.m. UTC | #16
On 12/19/2017 02:10 PM, Jarkko Sakkinen wrote:
> On Tue, Dec 19, 2017 at 12:34:46AM +0100, Javier Martinez Canillas wrote:
>> Hello Jason,
>>
>> On 12/18/2017 09:19 PM, Jason Gunthorpe wrote:
>>> On Mon, Dec 18, 2017 at 07:34:29PM +0000, Shaikh, Azhar wrote:
>>>
>>>>> IIUC, if CLKRUN_EN is enabled, then all the devices attached to the
>>>>> LPC bus have to support the CLKRUN protocol. My guess is that on
>>>>> some Braswell systems LPC power management is enabled but the TPM
>>>>> device doesn't have CLKRUN support.
>>>>
>>>> I think this is what might be happening here.
>>>
>>> That makes it a BIOS bug, not a chipset bug, and we shouldn't be
>>> trying to fix it like this in Linux.
>>>
>>
>> Indeed, the system integrator should make sure that all peripherals that
>> are connected through the LPC bus either support the CLKRUN protocol and
>> CLKRUN_EN is enabled or CLKRUN_EN should be disabled.
>>  
>>> Based on the original discussion I always thought this was an Intel
>>> chipset bug and applies to all cases.
>>>
>>
>> After thinking about this and with a better understanding of the issue,
>> I think we have 2 options (please let me know if I got something wrong):
>>
>> 1) Leave the code as is and apply the patch I shared with James. In that
>>    case the CLKRUN protocol will be disabled only during TPM transactions
>>    and not enabled again after transactions if it wasn't enabled.
>>
>>    This shouldn't affect other peripherals since even if they have CLKRUN
>>    support, they should work correctly while CLKRUN protocol is disabled.
>>
>>    The disadvantage is that TPM devices that have CLKRUN support (do they
>>    exist?) will not take the advantage of the power management feature of
>>    stopping the LPC host LCLK clock during low-power states.
> 
> CLKRUN is enabled after TPM has processed the command so how this could
> make power mgmt worse?
>

I meant if there are Braswell systems with a TPM that _does_ support CLKRUN
protocol. Since the current code disables ClKRUN_EN for all Braswell boards.
 
> /Jarkko
> 

Best regards,
James Ettle Dec. 19, 2017, 1:18 p.m. UTC | #17
I'm OK with that.

Do you need any more info on the specifics of the machine I'm using?

(Note: I'm not currently aware of any BIOS updates for it. If this is ultimately a BIOS problem I wouldn't know what sort of report to file, or how to file it anyway...)

Thanks,
James

On 19/12/17 13:00, Jarkko Sakkinen wrote:
> On Mon, Dec 18, 2017 at 06:26:35PM +0000, James Ettle wrote:
>> The keyboard and touchpad work OK with the patch quoted below and the earlier two applied, i.e. the three patches with signatures:
>>
>> 667dcc75be864ff4c17cf58891853b7393bba3e2
>> db3248e8a036c39141c8f7e9f1cf5c5ae6815f76
>> 370d45a34dc8914066a995a3a6d6df1953ea9f60
>>
>> I applied these to a vanilla kernel.org 4.14.7 source using Fedora 4.14.5-300.fc27 .config. Confirmed the tpm modules are loaded.
>>
>> Tests:
>> 1. Keyboard and touchpad work OK on boot - PASS
>> 2. Still work after suspend/resume - PASS
>>
>> Let me know if you want any further tests.
>>
>> Many thanks,
>> James.
> 
> Thank you for your awesome help! I guess Javier can add your
> Tested-by to the patch?
> 
> /Jarkko
>
Jason Gunthorpe Dec. 19, 2017, 8:20 p.m. UTC | #18
On Tue, Dec 19, 2017 at 03:04:32PM +0200, Jarkko Sakkinen wrote:

> > That makes it a BIOS bug, not a chipset bug, and we shouldn't be
> > trying to fix it like this in Linux.
> > 
> > Based on the original discussion I always thought this was an Intel
> > chipset bug and applies to all cases.
> 
> It would not be first time when some BIOS issues are circumvented in the
> kernel.

Yes, but we don't circumvent BIOS bugs by detecting the chipset type,
we look for the buggy BIOS.

Jason
Jarkko Sakkinen Dec. 22, 2017, 6:30 p.m. UTC | #19
> I'm OK with that.
> 
> Do you need any more info on the specifics of the machine I'm using?
> 
> (Note: I'm not currently aware of any BIOS updates for it. If this is ultimately a BIOS problem I wouldn't know what sort of report to file, or how to file it anyway...)
> 
> Thanks,
> James

I just found out that the original bug was specific to BSW with
Infineon SLB9655 chip.

Unless you have this specific chip you can just give Tested-by
(and Reviewed-by).

/Jarkko
diff mbox

Patch

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index e7bd2e750f69..5f2b1fc2194f 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -688,7 +688,8 @@  static void tpm_tis_clkrun_enable(struct tpm_chip *chip, bool value)
 	struct tpm_tis_data *data = dev_get_drvdata(&chip->dev);
 	u32 clkrun_val;
 
-	if (!IS_ENABLED(CONFIG_X86) || !is_bsw())
+	if (!IS_ENABLED(CONFIG_X86) || !is_bsw() ||
+	    !data->ilb_base_addr)
 		return;
 
 	if (value) {
@@ -746,7 +747,7 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 		      const struct tpm_tis_phy_ops *phy_ops,
 		      acpi_handle acpi_dev_handle)
 {
-	u32 vendor, intfcaps, intmask;
+	u32 vendor, intfcaps, intmask, clkrun_val;
 	u8 rid;
 	int rc, probe;
 	struct tpm_chip *chip;
@@ -772,6 +773,14 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 					ILB_REMAP_SIZE);
 		if (!priv->ilb_base_addr)
 			return -ENOMEM;
+
+		clkrun_val = ioread32(priv->ilb_base_addr + LPC_CNTRL_OFFSET);
+		/* Check if CLKRUN# is already not enabled in the LPC bus */
+		if (!(clkrun_val & LPC_CLKRUN_EN)) {
+			priv->flags |= TPM_TIS_CLK_ENABLE;
+			iounmap(priv->ilb_base_addr);
+			priv->ilb_base_addr = NULL;
+		}
 	}
 
 	if (chip->ops->clk_enable != NULL)
@@ -868,7 +877,7 @@  int tpm_tis_core_init(struct device *dev, struct tpm_tis_data *priv, int irq,
 	}
 
 	rc = tpm_chip_register(chip);
-	if (rc && is_bsw())
+	if (rc && is_bsw() && priv->ilb_base_addr)
 		iounmap(priv->ilb_base_addr);
 
 	if (chip->ops->clk_enable != NULL)