diff mbox series

[3/3] hwmon: (coretemp) Use a model-specific bitmask to read registers

Message ID 20240406010416.4821-4-ricardo.neri-calderon@linux.intel.com (mailing list archive)
State Superseded, archived
Headers show
Series drivers: thermal/hwmon: intel: Use model-specific bitmasks for temperature registers | expand

Commit Message

Ricardo Neri April 6, 2024, 1:04 a.m. UTC
The Intel Software Development manual defines states the temperature
digital readout as the bits [22:16] of the IA32_[PACKAGE]_THERM_STATUS
registers. In recent processor, however, the range is [23:16]. Use a
model-specific bitmask to extract the temperature readout correctly.

Instead of re-implementing model checks, extract the correct bitmask
using the intel_tcc library. Add an 'imply' weak reverse dependency on
CONFIG_INTEL_TCC. This captures the dependency and lets user to unselect
them if they are so inclined. In such case, the bitmask used for the
digital readout is [22:16] as specified in the Intel Software Developer's
manual.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-hwmon@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # v6.7+
---
 drivers/hwmon/Kconfig    | 1 +
 drivers/hwmon/coretemp.c | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Guenter Roeck April 6, 2024, 1:17 p.m. UTC | #1
On Fri, Apr 05, 2024 at 06:04:16PM -0700, Ricardo Neri wrote:
> The Intel Software Development manual defines states the temperature
> digital readout as the bits [22:16] of the IA32_[PACKAGE]_THERM_STATUS
> registers. In recent processor, however, the range is [23:16]. Use a
> model-specific bitmask to extract the temperature readout correctly.
> 
> Instead of re-implementing model checks, extract the correct bitmask
> using the intel_tcc library. Add an 'imply' weak reverse dependency on
> CONFIG_INTEL_TCC. This captures the dependency and lets user to unselect
> them if they are so inclined. In such case, the bitmask used for the
> digital readout is [22:16] as specified in the Intel Software Developer's
> manual.
> 
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: linux-hwmon@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org # v6.7+
> ---
>  drivers/hwmon/Kconfig    | 1 +
>  drivers/hwmon/coretemp.c | 6 +++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index 83945397b6eb..11d72b3009bf 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -847,6 +847,7 @@ config SENSORS_I5500
>  config SENSORS_CORETEMP
>  	tristate "Intel Core/Core2/Atom temperature sensor"
>  	depends on X86
> +	imply INTEL_TCC

I do not think it is appropriate to make a hardware monitoring driver
depend on the thermal subsystem.

NAK in the current form.

Guenter
Zhang, Rui April 7, 2024, 8:24 a.m. UTC | #2
On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> The Intel Software Development manual defines states the temperature

I failed to parse this, is the above "states" redundant?

[...]

> digital readout as the bits [22:16] of the
> IA32_[PACKAGE]_THERM_STATUS
> registers. In recent processor, however, the range is [23:16]. Use a
> model-specific bitmask to extract the temperature readout correctly.
> 
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 616bd1a5b864..5632e1b1dfb1 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -17,6 +17,7 @@
>  #include <linux/sysfs.h>
>  #include <linux/hwmon-sysfs.h>
>  #include <linux/err.h>
> +#include <linux/intel_tcc.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/platform_device.h>
> @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
>         tjmax = get_tjmax(tdata, dev);
>         /* Check whether the time interval has elapsed */
>         if (time_after(jiffies, tdata->last_updated + HZ)) {
> +               u32 mask =
> intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> +
>                 rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> &edx);
>                 /*
>                  * Ignore the valid bit. In all observed cases the
> register
> @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
>                  * Return it instead of reporting an error which
> doesn't
>                  * really help at all.
>                  */
> -               tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> +               tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
>                 tdata->last_updated = jiffies;
>         }
> 
Besides this one, we can also convert to use intel_tcc_get_tjmax() in
get_tjmax().

thanks,
rui
Zhang, Rui April 7, 2024, 8:39 a.m. UTC | #3
On Sat, 2024-04-06 at 06:17 -0700, Guenter Roeck wrote:
> On Fri, Apr 05, 2024 at 06:04:16PM -0700, Ricardo Neri wrote:
> > The Intel Software Development manual defines states the
> > temperature
> > digital readout as the bits [22:16] of the
> > IA32_[PACKAGE]_THERM_STATUS
> > registers. In recent processor, however, the range is [23:16]. Use
> > a
> > model-specific bitmask to extract the temperature readout
> > correctly.
> > 
> > Instead of re-implementing model checks, extract the correct
> > bitmask
> > using the intel_tcc library. Add an 'imply' weak reverse dependency
> > on
> > CONFIG_INTEL_TCC. This captures the dependency and lets user to
> > unselect
> > them if they are so inclined. In such case, the bitmask used for
> > the
> > digital readout is [22:16] as specified in the Intel Software
> > Developer's
> > manual.
> > 
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> > Cc: Lukasz Luba <lukasz.luba@arm.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: linux-hwmon@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: stable@vger.kernel.org # v6.7+
> > ---
> >  drivers/hwmon/Kconfig    | 1 +
> >  drivers/hwmon/coretemp.c | 6 +++++-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index 83945397b6eb..11d72b3009bf 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -847,6 +847,7 @@ config SENSORS_I5500
> >  config SENSORS_CORETEMP
> >         tristate "Intel Core/Core2/Atom temperature sensor"
> >         depends on X86
> > +       imply INTEL_TCC
> 
> I do not think it is appropriate to make a hardware monitoring driver
> depend on the thermal subsystem.
> 
> NAK in the current form.
> 
Hi, Guenter,

Thanks for reviewing.

We've seen a couple of hwmon drivers depends on or imply THERMAL.
That's why we think this is an applicable solution.
Using the intel_tcc APIs can effectively reduce the future maintenance
burden because we don't need to duplicate the model list in multiple
places.

or do you have any suggestions?

thanks,
rui
Guenter Roeck April 8, 2024, 11:40 a.m. UTC | #4
On Sun, Apr 07, 2024 at 08:39:51AM +0000, Zhang, Rui wrote:
> > I do not think it is appropriate to make a hardware monitoring driver
> > depend on the thermal subsystem.
> > 
> > NAK in the current form.
> > 
> Hi, Guenter,
> 
> Thanks for reviewing.
> 
> We've seen a couple of hwmon drivers depends on or imply THERMAL.
> That's why we think this is an applicable solution.

So far this was - unless someone sneaked something by - for drivers
which registered thermal zones, not for calling code which resides
inside thermal subsystem.

> Using the intel_tcc APIs can effectively reduce the future maintenance
> burden because we don't need to duplicate the model list in multiple
> places.
> 
> or do you have any suggestions?

The exported code should reside outside the thermal subsystem.

Also, as implemented, if INTEL_TCC=n, the returned temperature mask value
is 0x7f, and the offset mask is 0. So the alternative would be to just use
those values unconditionally since apparently that is sufficient.

Thanks,
Guenter
Ricardo Neri April 15, 2024, 1:19 a.m. UTC | #5
On Sun, Apr 07, 2024 at 08:24:40AM +0000, Zhang, Rui wrote:
> On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> > The Intel Software Development manual defines states the temperature
> 
> I failed to parse this, is the above "states" redundant?

Sorry Rui! I missed this repy.

Ah, the commit message is wrong. I will do s/defines//

> 
> [...]
> 
> > digital readout as the bits [22:16] of the
> > IA32_[PACKAGE]_THERM_STATUS
> > registers. In recent processor, however, the range is [23:16]. Use a
> > model-specific bitmask to extract the temperature readout correctly.
> > 
> > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > index 616bd1a5b864..5632e1b1dfb1 100644
> > --- a/drivers/hwmon/coretemp.c
> > +++ b/drivers/hwmon/coretemp.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/sysfs.h>
> >  #include <linux/hwmon-sysfs.h>
> >  #include <linux/err.h>
> > +#include <linux/intel_tcc.h>
> >  #include <linux/mutex.h>
> >  #include <linux/list.h>
> >  #include <linux/platform_device.h>
> > @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
> >         tjmax = get_tjmax(tdata, dev);
> >         /* Check whether the time interval has elapsed */
> >         if (time_after(jiffies, tdata->last_updated + HZ)) {
> > +               u32 mask =
> > intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> > +
> >                 rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> > &edx);
> >                 /*
> >                  * Ignore the valid bit. In all observed cases the
> > register
> > @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
> >                  * Return it instead of reporting an error which
> > doesn't
> >                  * really help at all.
> >                  */
> > -               tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> > +               tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
> >                 tdata->last_updated = jiffies;
> >         }
> > 
> Besides this one, we can also convert to use intel_tcc_get_tjmax() in
> get_tjmax().

I thought about this, but realized that the bitmask of TjMax is always
[23:16]; no need for a model check. If anything, intel_tcc_get_tjmax()
would remove some duplicated code. But coretemp.c would need to depend
on INTEL_TCC, which seems to be a non-starter.
Guenter Roeck April 15, 2024, 12:42 p.m. UTC | #6
On Sun, Apr 14, 2024 at 06:19:46PM -0700, Ricardo Neri wrote:
> On Sun, Apr 07, 2024 at 08:24:40AM +0000, Zhang, Rui wrote:
> > On Fri, 2024-04-05 at 18:04 -0700, Ricardo Neri wrote:
> > > The Intel Software Development manual defines states the temperature
> > 
> > I failed to parse this, is the above "states" redundant?
> 
> Sorry Rui! I missed this repy.
> 
> Ah, the commit message is wrong. I will do s/defines//
> 
> > 
> > [...]
> > 
> > > digital readout as the bits [22:16] of the
> > > IA32_[PACKAGE]_THERM_STATUS
> > > registers. In recent processor, however, the range is [23:16]. Use a
> > > model-specific bitmask to extract the temperature readout correctly.
> > > 
> > > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> > > index 616bd1a5b864..5632e1b1dfb1 100644
> > > --- a/drivers/hwmon/coretemp.c
> > > +++ b/drivers/hwmon/coretemp.c
> > > @@ -17,6 +17,7 @@
> > >  #include <linux/sysfs.h>
> > >  #include <linux/hwmon-sysfs.h>
> > >  #include <linux/err.h>
> > > +#include <linux/intel_tcc.h>
> > >  #include <linux/mutex.h>
> > >  #include <linux/list.h>
> > >  #include <linux/platform_device.h>
> > > @@ -404,6 +405,8 @@ static ssize_t show_temp(struct device *dev,
> > >         tjmax = get_tjmax(tdata, dev);
> > >         /* Check whether the time interval has elapsed */
> > >         if (time_after(jiffies, tdata->last_updated + HZ)) {
> > > +               u32 mask =
> > > intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
> > > +
> > >                 rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax,
> > > &edx);
> > >                 /*
> > >                  * Ignore the valid bit. In all observed cases the
> > > register
> > > @@ -411,7 +414,7 @@ static ssize_t show_temp(struct device *dev,
> > >                  * Return it instead of reporting an error which
> > > doesn't
> > >                  * really help at all.
> > >                  */
> > > -               tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
> > > +               tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
> > >                 tdata->last_updated = jiffies;
> > >         }
> > > 
> > Besides this one, we can also convert to use intel_tcc_get_tjmax() in
> > get_tjmax().
> 
> I thought about this, but realized that the bitmask of TjMax is always
> [23:16]; no need for a model check. If anything, intel_tcc_get_tjmax()
> would remove some duplicated code. But coretemp.c would need to depend
> on INTEL_TCC, which seems to be a non-starter.
> 

Calling intel_tcc_get_temp_mask() in practice already introduces that
dependency because it returns a fixed mask if INTEL_TCC is not enabled.
If that doesn't matter, the dynamic mask is unnecessary to start with.

Guenter
diff mbox series

Patch

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index 83945397b6eb..11d72b3009bf 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -847,6 +847,7 @@  config SENSORS_I5500
 config SENSORS_CORETEMP
 	tristate "Intel Core/Core2/Atom temperature sensor"
 	depends on X86
+	imply INTEL_TCC
 	help
 	  If you say yes here you get support for the temperature
 	  sensor inside your CPU. Most of the family 6 CPUs
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 616bd1a5b864..5632e1b1dfb1 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -17,6 +17,7 @@ 
 #include <linux/sysfs.h>
 #include <linux/hwmon-sysfs.h>
 #include <linux/err.h>
+#include <linux/intel_tcc.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/platform_device.h>
@@ -404,6 +405,8 @@  static ssize_t show_temp(struct device *dev,
 	tjmax = get_tjmax(tdata, dev);
 	/* Check whether the time interval has elapsed */
 	if (time_after(jiffies, tdata->last_updated + HZ)) {
+		u32 mask = intel_tcc_get_temp_mask(is_pkg_temp_data(tdata));
+
 		rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
 		/*
 		 * Ignore the valid bit. In all observed cases the register
@@ -411,7 +414,7 @@  static ssize_t show_temp(struct device *dev,
 		 * Return it instead of reporting an error which doesn't
 		 * really help at all.
 		 */
-		tdata->temp = tjmax - ((eax >> 16) & 0x7f) * 1000;
+		tdata->temp = tjmax - ((eax >> 16) & mask) * 1000;
 		tdata->last_updated = jiffies;
 	}
 
@@ -838,4 +841,5 @@  module_exit(coretemp_exit)
 
 MODULE_AUTHOR("Rudolf Marek <r.marek@assembler.cz>");
 MODULE_DESCRIPTION("Intel Core temperature monitor");
+MODULE_IMPORT_NS(INTEL_TCC);
 MODULE_LICENSE("GPL");