diff mbox

dell-smm-hwmon: Cache fan_type() calls and use fan_status() for fan detection

Message ID 1463842001-17785-1-git-send-email-pali.rohar@gmail.com (mailing list archive)
State Accepted
Headers show

Commit Message

Pali Rohár May 21, 2016, 2:46 p.m. UTC
On more Dell machines (e.g. Dell Precision M3800) I8K_SMM_GET_FAN_TYPE call
is too expensive (CPU is too long in SMM mode) and cause kernel to hang.
This patch cache type for each fan (as it should not change) and change the
way how fan presense is detected. It revert and use function fan_status()
as was before commit f989e55452c7 ("i8k: Add support for fan labels").

Moreover, kernel hangs for 2 - 3 seconds only sometimes and only on some
Dell machines. When kernel hangs fan speed is at max. So it was hard to
debug and bisect where is root of this problem. It looks like this is bug
in Dell BIOS which implement fan type SMM code... and there is no way how
to fix it in kernel.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Reported-and-tested-by: Tolga Cakir <cevelnet@gmail.com>
Fixes: f989e55452c7 ("i8k: Add support for fan labels")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
Cc: stable@vger.kernel.org # v4.0+, will need backport
---
 drivers/hwmon/dell-smm-hwmon.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)
---

Hi!

Jan C Peters, Thorsten Leemhuis, David Santamaría Rogado, Peter Saunderson
and others, can you test this patch if it fixes your freeze problem at
boottime and when using "sensors" program?

I need to know if this patch fixes problem on Dell Studio XPS 8000 and
Dell Studio XPS 8100 machines, so we can revert git commits:

6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")

Comments

Guenter Roeck May 22, 2016, 12:19 a.m. UTC | #1
On 05/21/2016 07:46 AM, Pali Rohár wrote:
> On more Dell machines (e.g. Dell Precision M3800) I8K_SMM_GET_FAN_TYPE call
> is too expensive (CPU is too long in SMM mode) and cause kernel to hang.
> This patch cache type for each fan (as it should not change) and change the
> way how fan presense is detected. It revert and use function fan_status()
> as was before commit f989e55452c7 ("i8k: Add support for fan labels").
>
> Moreover, kernel hangs for 2 - 3 seconds only sometimes and only on some
> Dell machines. When kernel hangs fan speed is at max. So it was hard to
> debug and bisect where is root of this problem. It looks like this is bug
> in Dell BIOS which implement fan type SMM code... and there is no way how
> to fix it in kernel.
>
> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> Reviewed-by: Jean Delvare <jdelvare@suse.de>
> Reported-and-tested-by: Tolga Cakir <cevelnet@gmail.com>
> Fixes: f989e55452c7 ("i8k: Add support for fan labels")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
> Cc: stable@vger.kernel.org # v4.0+, will need backport

Should this patch be applied, or do you wait for more testing ?

Guenter

> ---
>   drivers/hwmon/dell-smm-hwmon.c |   21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
> ---
>
> Hi!
>
> Jan C Peters, Thorsten Leemhuis, David Santamaría Rogado, Peter Saunderson
> and others, can you test this patch if it fixes your freeze problem at
> boottime and when using "sensors" program?
>
> I need to know if this patch fixes problem on Dell Studio XPS 8000 and
> Dell Studio XPS 8100 machines, so we can revert git commits:
>
> 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
> a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
>
> diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
> index c43318d..577445f 100644
> --- a/drivers/hwmon/dell-smm-hwmon.c
> +++ b/drivers/hwmon/dell-smm-hwmon.c
> @@ -235,7 +235,7 @@ static int i8k_get_fan_speed(int fan)
>   /*
>    * Read the fan type.
>    */
> -static int i8k_get_fan_type(int fan)
> +static int _i8k_get_fan_type(int fan)
>   {
>   	struct smm_regs regs = { .eax = I8K_SMM_GET_FAN_TYPE, };
>
> @@ -243,6 +243,17 @@ static int i8k_get_fan_type(int fan)
>   	return i8k_smm(&regs) ? : regs.eax & 0xff;
>   }
>
> +static int i8k_get_fan_type(int fan)
> +{
> +	/* I8K_SMM_GET_FAN_TYPE SMM call is expensive, so cache values */
> +	static int types[2] = { INT_MIN, INT_MIN };
> +
> +	if (types[fan] == INT_MIN)
> +		types[fan] = _i8k_get_fan_type(fan);
> +
> +	return types[fan];
> +}
> +
>   /*
>    * Read the fan nominal rpm for specific fan speed.
>    */
> @@ -767,13 +778,13 @@ static int __init i8k_init_hwmon(void)
>   	if (err >= 0)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
>
> -	/* First fan attributes, if fan type is OK */
> -	err = i8k_get_fan_type(0);
> +	/* First fan attributes, if fan status is OK */
> +	err = i8k_get_fan_status(0);
>   	if (err >= 0)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN1;
>
> -	/* Second fan attributes, if fan type is OK */
> -	err = i8k_get_fan_type(1);
> +	/* Second fan attributes, if fan status is OK */
> +	err = i8k_get_fan_status(1);
>   	if (err >= 0)
>   		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN2;
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pali Rohár May 22, 2016, 12:28 a.m. UTC | #2
On Sunday 22 May 2016 02:19:48 Guenter Roeck wrote:
> On 05/21/2016 07:46 AM, Pali Rohár wrote:
> > On more Dell machines (e.g. Dell Precision M3800)
> > I8K_SMM_GET_FAN_TYPE call is too expensive (CPU is too long in SMM
> > mode) and cause kernel to hang. This patch cache type for each fan
> > (as it should not change) and change the way how fan presense is
> > detected. It revert and use function fan_status() as was before
> > commit f989e55452c7 ("i8k: Add support for fan labels").
> > 
> > Moreover, kernel hangs for 2 - 3 seconds only sometimes and only on
> > some Dell machines. When kernel hangs fan speed is at max. So it
> > was hard to debug and bisect where is root of this problem. It
> > looks like this is bug in Dell BIOS which implement fan type SMM
> > code... and there is no way how to fix it in kernel.
> > 
> > Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> > Reviewed-by: Jean Delvare <jdelvare@suse.de>
> > Reported-and-tested-by: Tolga Cakir <cevelnet@gmail.com>
> > Fixes: f989e55452c7 ("i8k: Add support for fan labels")
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
> > Cc: stable@vger.kernel.org # v4.0+, will need backport
> 
> Should this patch be applied, or do you wait for more testing ?

I would like to hear some confirmation from people with affected machine.

> Guenter
> 
> > ---
> > 
> >   drivers/hwmon/dell-smm-hwmon.c |   21 ++++++++++++++++-----
> >   1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > ---
> > 
> > Hi!
> > 
> > Jan C Peters, Thorsten Leemhuis, David Santamaría Rogado, Peter
> > Saunderson and others, can you test this patch if it fixes your
> > freeze problem at boottime and when using "sensors" program?
> > 
> > I need to know if this patch fixes problem on Dell Studio XPS 8000
> > and Dell Studio XPS 8100 machines, so we can revert git commits:
> > 
> > 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
> > a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
> > 
> > diff --git a/drivers/hwmon/dell-smm-hwmon.c
> > b/drivers/hwmon/dell-smm-hwmon.c index c43318d..577445f 100644
> > --- a/drivers/hwmon/dell-smm-hwmon.c
> > +++ b/drivers/hwmon/dell-smm-hwmon.c
> > @@ -235,7 +235,7 @@ static int i8k_get_fan_speed(int fan)
> > 
> >   /*
> >   
> >    * Read the fan type.
> >    */
> > 
> > -static int i8k_get_fan_type(int fan)
> > +static int _i8k_get_fan_type(int fan)
> > 
> >   {
> >   
> >   	struct smm_regs regs = { .eax = I8K_SMM_GET_FAN_TYPE, };
> > 
> > @@ -243,6 +243,17 @@ static int i8k_get_fan_type(int fan)
> > 
> >   	return i8k_smm(&regs) ? : regs.eax & 0xff;
> >   
> >   }
> > 
> > +static int i8k_get_fan_type(int fan)
> > +{
> > +	/* I8K_SMM_GET_FAN_TYPE SMM call is expensive, so cache values */
> > +	static int types[2] = { INT_MIN, INT_MIN };
> > +
> > +	if (types[fan] == INT_MIN)
> > +		types[fan] = _i8k_get_fan_type(fan);
> > +
> > +	return types[fan];
> > +}
> > +
> > 
> >   /*
> >   
> >    * Read the fan nominal rpm for specific fan speed.
> >    */
> > 
> > @@ -767,13 +778,13 @@ static int __init i8k_init_hwmon(void)
> > 
> >   	if (err >= 0)
> >   	
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
> > 
> > -	/* First fan attributes, if fan type is OK */
> > -	err = i8k_get_fan_type(0);
> > +	/* First fan attributes, if fan status is OK */
> > +	err = i8k_get_fan_status(0);
> > 
> >   	if (err >= 0)
> >   	
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN1;
> > 
> > -	/* Second fan attributes, if fan type is OK */
> > -	err = i8k_get_fan_type(1);
> > +	/* Second fan attributes, if fan status is OK */
> > +	err = i8k_get_fan_status(1);
> > 
> >   	if (err >= 0)
> >   	
> >   		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN2;
Thorsten Leemhuis May 26, 2016, 3:39 p.m. UTC | #3
Lo!

Pali Rohár wrote on 21.05.2016 16:46:
> […] Thorsten […] can you test this patch if it fixes your freeze problem at
> boottime and when using "sensors" program?

FWIW, I never saw either of those problems. I only saw the third issue
that was mentioned: the CPU fan speed is going up and down as described
in https://bugzilla.kernel.org/show_bug.cgi?id=100121

> I need to know if this patch fixes problem on Dell Studio XPS 8000 and
> Dell Studio XPS 8100 machines, so we can revert git commits:
> 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
> a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")

Just checked: Sorry, that patch did not fix the CPU fan speed issue on
my XPS 8000, so there is afaics no reason to revert 6220f4ebd7b4 :-/

HTH, CU, knurd

--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pali Rohár May 26, 2016, 3:51 p.m. UTC | #4
On Thursday 26 May 2016 17:39:57 Thorsten Leemhuis wrote:
> > I need to know if this patch fixes problem on Dell Studio XPS 8000
> > and Dell Studio XPS 8100 machines, so we can revert git commits:
> > 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
> > a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
> 
> Just checked: Sorry, that patch did not fix the CPU fan speed issue
> on my XPS 8000, so there is afaics no reason to revert 6220f4ebd7b4
> :-/

Please, can you apply patch "dell-smm-hwmon: In debug mode log duration 
of SMM calls" and compile dell-smm-hwmon in debug mode to tell us which 
SMM call takes too long? This is probably they key for that XPS 
problems...
Thorsten Leemhuis May 27, 2016, 8 a.m. UTC | #5
Pali Rohár wrote on 26.05.2016 17:51:
> On Thursday 26 May 2016 17:39:57 Thorsten Leemhuis wrote:
>> > I need to know if this patch fixes problem on Dell Studio XPS 8000
>> > and Dell Studio XPS 8100 machines, so we can revert git commits:
>> > 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
>> > a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
>> Just checked: Sorry, that patch did not fix the CPU fan speed issue
>> on my XPS 8000, so there is afaics no reason to revert 6220f4ebd7b4
>> :-/
> Please, can you apply patch "dell-smm-hwmon: In debug mode log duration 
> of SMM calls" 

On top of "Cache fan_type() calls and use fan_status() for fan 
detection" or with that patch reverted? I did the latter and 
tested with 4.6 (just mentioning it here in case it matters).

> and compile dell-smm-hwmon in debug mode to tell us which 
> SMM call takes too long? This is probably they key for that XPS 
> problems...

[    5.941037] dell_smm_hwmon: smm(0xfea3 0x0000) = 0x4147  (took   14470 usecs)
[    5.941139] dell_smm_hwmon: smm(0x11a3 0x0000) = 0xffff  (took      93 usecs)
[    5.941236] dell_smm_hwmon: smm(0x11a3 0x0001) = 0xffff  (took      93 usecs)
[    5.941331] dell_smm_hwmon: smm(0x11a3 0x0002) = 0xffff  (took      92 usecs)
[    5.941427] dell_smm_hwmon: smm(0x11a3 0x0003) = 0xffff  (took      92 usecs)
[    5.941526] dell_smm_hwmon: smm(0x03a3 0x0000) = 0x0000  (took      95 usecs)
[    5.941625] dell_smm_hwmon: smm(0x03a3 0x0001) = 0x0001  (took      95 usecs)

HTH, CU, knurd
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pali Rohár May 27, 2016, 9:47 a.m. UTC | #6
On Friday 27 May 2016 10:00:05 Thorsten Leemhuis wrote:
> Pali Rohár wrote on 26.05.2016 17:51:
> > On Thursday 26 May 2016 17:39:57 Thorsten Leemhuis wrote:
> >> > I need to know if this patch fixes problem on Dell Studio XPS 8000
> >> > and Dell Studio XPS 8100 machines, so we can revert git commits:
> >> > 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
> >> > a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
> >> Just checked: Sorry, that patch did not fix the CPU fan speed issue
> >> on my XPS 8000, so there is afaics no reason to revert 6220f4ebd7b4
> >> :-/
> > Please, can you apply patch "dell-smm-hwmon: In debug mode log duration 
> > of SMM calls" 
> 
> On top of "Cache fan_type() calls and use fan_status() for fan 
> detection" or with that patch reverted? I did the latter and 
> tested with 4.6 (just mentioning it here in case it matters).

Ideally both.

> > and compile dell-smm-hwmon in debug mode to tell us which 
> > SMM call takes too long? This is probably they key for that XPS 
> > problems...
> 
> [    5.941037] dell_smm_hwmon: smm(0xfea3 0x0000) = 0x4147  (took   14470 usecs)
> [    5.941139] dell_smm_hwmon: smm(0x11a3 0x0000) = 0xffff  (took      93 usecs)
> [    5.941236] dell_smm_hwmon: smm(0x11a3 0x0001) = 0xffff  (took      93 usecs)
> [    5.941331] dell_smm_hwmon: smm(0x11a3 0x0002) = 0xffff  (took      92 usecs)
> [    5.941427] dell_smm_hwmon: smm(0x11a3 0x0003) = 0xffff  (took      92 usecs)
> [    5.941526] dell_smm_hwmon: smm(0x03a3 0x0000) = 0x0000  (took      95 usecs)
> [    5.941625] dell_smm_hwmon: smm(0x03a3 0x0001) = 0x0001  (took      95 usecs)
> 
> HTH, CU, knurd

I do not see any long taking SMM call here. Are you really sure that
your machine freeze when loading or using dell-smm-hwmon?
Thorsten Leemhuis May 27, 2016, 10:01 a.m. UTC | #7
Pali Rohár wrote on 27.05.2016 11:47:
> On Friday 27 May 2016 10:00:05 Thorsten Leemhuis wrote:
>> Pali Rohár wrote on 26.05.2016 17:51:
>> > On Thursday 26 May 2016 17:39:57 Thorsten Leemhuis wrote:
>> >> > I need to know if this patch fixes problem on Dell Studio XPS 8000
>> >> > and Dell Studio XPS 8100 machines, so we can revert git commits:
>> >> > 6220f4ebd7b4 ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8000")
>> >> > a4b45b25f18d ("hwmon: (dell-smm) Blacklist Dell Studio XPS 8100")
>> >> Just checked: Sorry, that patch did not fix the CPU fan speed issue
>> >> on my XPS 8000, so there is afaics no reason to revert 6220f4ebd7b4
>> >> :-/
>> > Please, can you apply patch "dell-smm-hwmon: In debug mode log duration 
>> > of SMM calls" 
> […] 
> I do not see any long taking SMM call here. Are you really sure that
> your machine freeze when loading or using dell-smm-hwmon?

Uhh, sorry, it seems something got mixed up somewhere. As mentioned
earlier in this thread already: The Studio XPS 8000 I have here never
froze when loading or using dell-smm-hwmon. I only saw its CPU fan speed
going up and down as described in
https://bugzilla.kernel.org/show_bug.cgi?id=100121 From the mail that
started this thread I got the impression that all those problems are all
related somehow, so I went testing the patch as asked.

CU, knurd
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pali Rohár May 27, 2016, 10:45 a.m. UTC | #8
On Friday 27 May 2016 12:01:21 Thorsten Leemhuis wrote:
> Pali Rohár wrote on 27.05.2016 11:47:
> > […]
> > I do not see any long taking SMM call here. Are you really sure
> > that your machine freeze when loading or using dell-smm-hwmon?
> 
> Uhh, sorry, it seems something got mixed up somewhere. As mentioned
> earlier in this thread already: The Studio XPS 8000 I have here never
> froze when loading or using dell-smm-hwmon. I only saw its CPU fan
> speed going up and down as described in
> https://bugzilla.kernel.org/show_bug.cgi?id=100121 From the mail that
> started this thread I got the impression that all those problems are
> all related somehow, so I went testing the patch as asked.

Looks like there are two different problems with dell-smm-hwmon driver:

1) Fan speed going randomly up and down without system freeze

2) System freeze at boot time and at that time fan speed goes up

I though that problem with fan speed is SMM freeze, but from your log I 
see that it is not truth. And my patch seems to fix problem 2).

So for problem 1) I need to know:

* Is it regression? I.e. is there kernel version under which i8k (or 
dell-smm-hwmon) is working?

* If it is regression, then I need somebody to fully fully bisect 
problematic commit. And then starts reverting it on top of new kernel 
versions (to check that only that commit cause problem 1)).
Pali Rohár June 13, 2016, 6:52 p.m. UTC | #9
On Sunday 22 May 2016 02:28:00 Pali Rohár wrote:
> On Sunday 22 May 2016 02:19:48 Guenter Roeck wrote:
> > On 05/21/2016 07:46 AM, Pali Rohár wrote:
> > > On more Dell machines (e.g. Dell Precision M3800)
> > > I8K_SMM_GET_FAN_TYPE call is too expensive (CPU is too long in
> > > SMM mode) and cause kernel to hang. This patch cache type for
> > > each fan (as it should not change) and change the way how fan
> > > presense is detected. It revert and use function fan_status() as
> > > was before commit f989e55452c7 ("i8k: Add support for fan
> > > labels").
> > > 
> > > Moreover, kernel hangs for 2 - 3 seconds only sometimes and only
> > > on some Dell machines. When kernel hangs fan speed is at max. So
> > > it was hard to debug and bisect where is root of this problem.
> > > It looks like this is bug in Dell BIOS which implement fan type
> > > SMM code... and there is no way how to fix it in kernel.
> > > 
> > > Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
> > > Reviewed-by: Jean Delvare <jdelvare@suse.de>
> > > Reported-and-tested-by: Tolga Cakir <cevelnet@gmail.com>
> > > Fixes: f989e55452c7 ("i8k: Add support for fan labels")
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
> > > Cc: stable@vger.kernel.org # v4.0+, will need backport
> > 
> > Should this patch be applied, or do you wait for more testing ?
> 
> I would like to hear some confirmation from people with affected
> machine.
> 

Ok, now after testing we know that kernel should prevent calling 
I8K_SMM_GET_FAN_TYPE on affected buggy Dell machines.

Looks like there are two different bugs in Dell SMM with 
I8K_SMM_GET_FAN_TYPE call.

First bug cause that kernel freeze for 2 - 3 seconds when 
I8K_SMM_GET_FAN_TYPE is issued.

Second bug cause that fan goes randomly up and down (that is controlled 
by Dell SMM) when I8K_SMM_GET_FAN_TYPE is issued. Normal behaviour is 
returned after machine reboots.

Some Dell machines are affected by first bug, some by second bug. And 
there are Dell machines without both bugs.

This my patch just partially fix first bug and prevent calling that call 
at boot time. But can be issued by sysfs (+value is cached, so it is 
called only once).

So question is: is my patch enough for fixing first bug?

And second question: how to fix second bug? I see only one option: 
Create machine blacklist with broken Dell SMM firmware and disallow 
calling I8K_SMM_GET_FAN_TYPE for them.
Guenter Roeck June 14, 2016, 2:02 a.m. UTC | #10
On 06/13/2016 11:52 AM, Pali Rohár wrote:
> On Sunday 22 May 2016 02:28:00 Pali Rohár wrote:
>> On Sunday 22 May 2016 02:19:48 Guenter Roeck wrote:
>>> On 05/21/2016 07:46 AM, Pali Rohár wrote:
>>>> On more Dell machines (e.g. Dell Precision M3800)
>>>> I8K_SMM_GET_FAN_TYPE call is too expensive (CPU is too long in
>>>> SMM mode) and cause kernel to hang. This patch cache type for
>>>> each fan (as it should not change) and change the way how fan
>>>> presense is detected. It revert and use function fan_status() as
>>>> was before commit f989e55452c7 ("i8k: Add support for fan
>>>> labels").
>>>>
>>>> Moreover, kernel hangs for 2 - 3 seconds only sometimes and only
>>>> on some Dell machines. When kernel hangs fan speed is at max. So
>>>> it was hard to debug and bisect where is root of this problem.
>>>> It looks like this is bug in Dell BIOS which implement fan type
>>>> SMM code... and there is no way how to fix it in kernel.
>>>>
>>>> Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
>>>> Reviewed-by: Jean Delvare <jdelvare@suse.de>
>>>> Reported-and-tested-by: Tolga Cakir <cevelnet@gmail.com>
>>>> Fixes: f989e55452c7 ("i8k: Add support for fan labels")
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=100121
>>>> Cc: stable@vger.kernel.org # v4.0+, will need backport
>>>
>>> Should this patch be applied, or do you wait for more testing ?
>>
>> I would like to hear some confirmation from people with affected
>> machine.
>>
>
> Ok, now after testing we know that kernel should prevent calling
> I8K_SMM_GET_FAN_TYPE on affected buggy Dell machines.
>
> Looks like there are two different bugs in Dell SMM with
> I8K_SMM_GET_FAN_TYPE call.
>
> First bug cause that kernel freeze for 2 - 3 seconds when
> I8K_SMM_GET_FAN_TYPE is issued.
>
> Second bug cause that fan goes randomly up and down (that is controlled
> by Dell SMM) when I8K_SMM_GET_FAN_TYPE is issued. Normal behaviour is
> returned after machine reboots.
>
> Some Dell machines are affected by first bug, some by second bug. And
> there are Dell machines without both bugs.
>
> This my patch just partially fix first bug and prevent calling that call
> at boot time. But can be issued by sysfs (+value is cached, so it is
> called only once).
>
> So question is: is my patch enough for fixing first bug?
>
> And second question: how to fix second bug? I see only one option:
> Create machine blacklist with broken Dell SMM firmware and disallow
> calling I8K_SMM_GET_FAN_TYPE for them.
>
Or maybe a whitelist with known working systems ?

Guenter

--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pali Rohár June 15, 2016, 8:03 a.m. UTC | #11
On Monday 13 June 2016 19:02:14 Guenter Roeck wrote:
> >And second question: how to fix second bug? I see only one option:
> >Create machine blacklist with broken Dell SMM firmware and disallow
> >calling I8K_SMM_GET_FAN_TYPE for them.
> >
> Or maybe a whitelist with known working systems ?
> 
> Guenter

Smaller list is better, so I will try to collect blacklist and if it
will be big we can use whitelist.
diff mbox

Patch

diff --git a/drivers/hwmon/dell-smm-hwmon.c b/drivers/hwmon/dell-smm-hwmon.c
index c43318d..577445f 100644
--- a/drivers/hwmon/dell-smm-hwmon.c
+++ b/drivers/hwmon/dell-smm-hwmon.c
@@ -235,7 +235,7 @@  static int i8k_get_fan_speed(int fan)
 /*
  * Read the fan type.
  */
-static int i8k_get_fan_type(int fan)
+static int _i8k_get_fan_type(int fan)
 {
 	struct smm_regs regs = { .eax = I8K_SMM_GET_FAN_TYPE, };
 
@@ -243,6 +243,17 @@  static int i8k_get_fan_type(int fan)
 	return i8k_smm(&regs) ? : regs.eax & 0xff;
 }
 
+static int i8k_get_fan_type(int fan)
+{
+	/* I8K_SMM_GET_FAN_TYPE SMM call is expensive, so cache values */
+	static int types[2] = { INT_MIN, INT_MIN };
+
+	if (types[fan] == INT_MIN)
+		types[fan] = _i8k_get_fan_type(fan);
+
+	return types[fan];
+}
+
 /*
  * Read the fan nominal rpm for specific fan speed.
  */
@@ -767,13 +778,13 @@  static int __init i8k_init_hwmon(void)
 	if (err >= 0)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_TEMP4;
 
-	/* First fan attributes, if fan type is OK */
-	err = i8k_get_fan_type(0);
+	/* First fan attributes, if fan status is OK */
+	err = i8k_get_fan_status(0);
 	if (err >= 0)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN1;
 
-	/* Second fan attributes, if fan type is OK */
-	err = i8k_get_fan_type(1);
+	/* Second fan attributes, if fan status is OK */
+	err = i8k_get_fan_status(1);
 	if (err >= 0)
 		i8k_hwmon_flags |= I8K_HWMON_HAVE_FAN2;