diff mbox

hwmon: (k10temp) Add support for family 17h

Message ID 1504575935-19476-1-git-send-email-linux@roeck-us.net
State Superseded
Headers show

Commit Message

Guenter Roeck Sept. 5, 2017, 1:45 a.m. UTC
Add support for temperature sensors on Family 17h (Ryzen) processors.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
Some of this is guesswork, but afaics it is working. No idea if there
is a better way to determine the temperature offset.

 drivers/hwmon/k10temp.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

Comments

Clemens Ladisch Sept. 5, 2017, 6:47 a.m. UTC | #1
Guenter Roeck wrote:
> Add support for temperature sensors on Family 17h (Ryzen) processors.
>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> Some of this is guesswork, but afaics it is working. No idea if there
> is a better way to determine the temperature offset.

The reported value is not an absolute temperature on any CPU.

As far as I know, the offset is not guaranteed to be fixed for any model,
i.e., it would be pointless to apply the offset observed on one specific
chip.

> @@ -25,6 +25,10 @@
>  #include <linux/pci.h>
>  #include <asm/processor.h>
>
> +#ifndef PCI_DEVICE_ID_AMD_17H_DF_F3
> +#define PCI_DEVICE_ID_AMD_17H_DF_F3	0x1463
> +#endif
> +

Please move this down to the other symbols.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck Sept. 5, 2017, 1:35 p.m. UTC | #2
On 09/04/2017 11:47 PM, Clemens Ladisch wrote:
> Guenter Roeck wrote:
>> Add support for temperature sensors on Family 17h (Ryzen) processors.
>>
>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>> ---
>> Some of this is guesswork, but afaics it is working. No idea if there
>> is a better way to determine the temperature offset.
> 
> The reported value is not an absolute temperature on any CPU.
> 
> As far as I know, the offset is not guaranteed to be fixed for any model,
> i.e., it would be pointless to apply the offset observed on one specific
> chip.
> 

What we should do then, as we did for coretemp, would be to collect the various
temperature offsets (and temperature limits, for that matter) and apply per-CPU
adjustments. Are the offsets documented somewhere ?

>> @@ -25,6 +25,10 @@
>>   #include <linux/pci.h>
>>   #include <asm/processor.h>
>>
>> +#ifndef PCI_DEVICE_ID_AMD_17H_DF_F3
>> +#define PCI_DEVICE_ID_AMD_17H_DF_F3	0x1463
>> +#endif
>> +
> 
> Please move this down to the other symbols.
> 
Done.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Clemens Ladisch Sept. 5, 2017, 2:12 p.m. UTC | #3
Guenter Roeck wrote:
> On 09/04/2017 11:47 PM, Clemens Ladisch wrote:
>> Guenter Roeck wrote:
>>> Some of this is guesswork, but afaics it is working. No idea if there
>>> is a better way to determine the temperature offset.
>>
>> The reported value is not an absolute temperature on any CPU.
>>
>> As far as I know, the offset is not guaranteed to be fixed for any model,
>> i.e., it would be pointless to apply the offset observed on one specific
>> chip.
>
> What we should do then, as we did for coretemp, would be to collect the various
> temperature offsets (and temperature limits, for that matter) and apply per-CPU
> adjustments. Are the offsets documented somewhere ?

AMD says:
 "Tctl is the processor temperature control value, used by the platform to
  control cooling systems. Tctl is a non-physical temperature on an
  arbitrary scale measured in degrees. It does _not_ represent an actual
  physical temperature like die or case temperature. Instead, it specifies
  the processor temperature relative to the point at which the system must
  supply the maximum cooling for the processor's specified maximum case
  temperature and maximum thermal power dissipation."

(That point is defined as 70.)


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guenter Roeck Sept. 5, 2017, 4:45 p.m. UTC | #4
On Tue, Sep 05, 2017 at 04:12:07PM +0200, Clemens Ladisch wrote:
> Guenter Roeck wrote:
> > On 09/04/2017 11:47 PM, Clemens Ladisch wrote:
> >> Guenter Roeck wrote:
> >>> Some of this is guesswork, but afaics it is working. No idea if there
> >>> is a better way to determine the temperature offset.
> >>
> >> The reported value is not an absolute temperature on any CPU.
> >>
> >> As far as I know, the offset is not guaranteed to be fixed for any model,
> >> i.e., it would be pointless to apply the offset observed on one specific
> >> chip.
> >
> > What we should do then, as we did for coretemp, would be to collect the various
> > temperature offsets (and temperature limits, for that matter) and apply per-CPU
> > adjustments. Are the offsets documented somewhere ?
> 
> AMD says:
>  "Tctl is the processor temperature control value, used by the platform to
>   control cooling systems. Tctl is a non-physical temperature on an
>   arbitrary scale measured in degrees. It does _not_ represent an actual
>   physical temperature like die or case temperature. Instead, it specifies
>   the processor temperature relative to the point at which the system must
>   supply the maximum cooling for the processor's specified maximum case
>   temperature and maximum thermal power dissipation."
> 


Pretty much the same as Intel. That doesn't mean we should not (try to) report
the real temperature as good as we can, as at least most of the BIOSes do,
and as all the Windows tools do, and as users expect us to do.

Do we really have to argue about this ? If you insist, I'll drop the
adjustments and refer all resulting inquiries to you.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Clemens Ladisch Sept. 5, 2017, 5:48 p.m. UTC | #5
Guenter Roeck wrote:
> On Tue, Sep 05, 2017 at 04:12:07PM +0200, Clemens Ladisch wrote:
>> Guenter Roeck wrote:
>>> What we should do then, as we did for coretemp, would be to collect the various
>>> temperature offsets (and temperature limits, for that matter) and apply per-CPU
>>> adjustments. Are the offsets documented somewhere ?
>>
>> AMD says:
>>  "Tctl is the processor temperature control value, used by the platform to
>>   control cooling systems. Tctl is a non-physical temperature on an
>>   arbitrary scale measured in degrees. It does _not_ represent an actual
>>   physical temperature like die or case temperature. Instead, it specifies
>>   the processor temperature relative to the point at which the system must
>>   supply the maximum cooling for the processor's specified maximum case
>>   temperature and maximum thermal power dissipation."
>
> Pretty much the same as Intel. That doesn't mean we should not (try to) report
> the real temperature as good as we can, as at least most of the BIOSes do,

AFAIK the BIOSes use the thermal diode (with external circuitry) for that.

> and as all the Windows tools do, and as users expect us to do.
>
> Do we really have to argue about this ?

Looking at coretemp, this is going to be a maintenance nightmare.

Oh well.  If you insist, please add a proper chip-to-offset database, and
apply the offset to all four values.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-hwmon" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index ce3b91f22e30..da8fec89020e 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -25,6 +25,10 @@ 
 #include <linux/pci.h>
 #include <asm/processor.h>
 
+#ifndef PCI_DEVICE_ID_AMD_17H_DF_F3
+#define PCI_DEVICE_ID_AMD_17H_DF_F3	0x1463
+#endif
+
 MODULE_DESCRIPTION("AMD Family 10h+ CPU core temperature monitor");
 MODULE_AUTHOR("Clemens Ladisch <clemens@ladisch.de>");
 MODULE_LICENSE("GPL");
@@ -61,31 +65,46 @@  static DEFINE_MUTEX(nb_smu_ind_mutex);
  */
 #define F15H_M60H_REPORTED_TEMP_CTRL_OFFSET	0xd8200ca4
 
-static void amd_nb_smu_index_read(struct pci_dev *pdev, unsigned int devfn,
-				  int offset, u32 *val)
+/* F17h M01h Access througn SMN */
+#define F17H_M01H_REPORTED_TEMP_CTRL_OFFSET	0x00059800
+
+static void amd_nb_index_read(struct pci_dev *pdev, unsigned int devfn,
+			      unsigned int base, int offset, u32 *val)
 {
 	mutex_lock(&nb_smu_ind_mutex);
 	pci_bus_write_config_dword(pdev->bus, devfn,
-				   0xb8, offset);
+				   base, offset);
 	pci_bus_read_config_dword(pdev->bus, devfn,
-				  0xbc, val);
+				  base + 4, val);
 	mutex_unlock(&nb_smu_ind_mutex);
 }
 
 static ssize_t temp1_input_show(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	u32 regval;
 	struct pci_dev *pdev = dev_get_drvdata(dev);
-
-	if (boot_cpu_data.x86 == 0x15 && boot_cpu_data.x86_model == 0x60) {
-		amd_nb_smu_index_read(pdev, PCI_DEVFN(0, 0),
-				      F15H_M60H_REPORTED_TEMP_CTRL_OFFSET,
-				      &regval);
+	u32 regval;
+	u32 temp;
+
+	if (boot_cpu_data.x86 == 0x15 && (boot_cpu_data.x86_model == 0x60 ||
+					  boot_cpu_data.x86_model == 0x70)) {
+		amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0xb8,
+				  F15H_M60H_REPORTED_TEMP_CTRL_OFFSET, &regval);
+	} else if (boot_cpu_data.x86 == 0x17) {
+		amd_nb_index_read(pdev, PCI_DEVFN(0, 0), 0x60,
+				  F17H_M01H_REPORTED_TEMP_CTRL_OFFSET, &regval);
 	} else {
 		pci_read_config_dword(pdev, REG_REPORTED_TEMPERATURE, &regval);
 	}
-	return sprintf(buf, "%u\n", (regval >> 21) * 125);
+
+	temp = (regval >> 21) * 125;
+	/* Ryzen 1700X and 1800X require manually applied temperature offset */
+	if (boot_cpu_data.x86_model_id &&
+	    (strstr(boot_cpu_data.x86_model_id, "AMD Ryzen 7 1700X") ||
+	     strstr(boot_cpu_data.x86_model_id, "AMD Ryzen 7 1800X")))
+		temp -= 20000;
+
+	return sprintf(buf, "%u\n", temp);
 }
 
 static ssize_t temp1_max_show(struct device *dev,
@@ -214,6 +233,7 @@  static const struct pci_device_id k10temp_id_table[] = {
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F3) },
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) },
 	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) },
+	{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_17H_DF_F3) },
 	{}
 };
 MODULE_DEVICE_TABLE(pci, k10temp_id_table);