mbox series

[v2,0/5] hwmon: k10temp driver improvements

Message ID 20200118172615.26329-1-linux@roeck-us.net (mailing list archive)
Headers show
Series hwmon: k10temp driver improvements | expand

Message

Guenter Roeck Jan. 18, 2020, 5:26 p.m. UTC
This patch series implements various improvements for the k10temp driver.

Patch 1/5 introduces the use of bit operations.

Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
API. This not only simplifies the code and reduces its size, it also
makes the code easier to maintain and enhance. 

Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
on Ryzen 3 (Zen2) CPUs.

Patch 4/5 adds support for reporting core and SoC current and voltage
information on Ryzen CPUs.

Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
It is inaccurate, misleading, and it just doesn't make sense to report
wrong information.

With all patches in place, output on Ryzen 3900X CPUs looks as follows
(with the system under load).

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.36 V
Vsoc:         +1.18 V
Tdie:         +86.8°C
Tctl:         +86.8°C
Tccd1:        +80.0°C
Tccd2:        +81.8°C
Icore:       +44.14 A
Isoc:        +13.83 A

The voltage and current information is limited to Ryzen CPUs. Voltage
and current reporting on Threadripper and EPYC CPUs is different, and the
reported information is either incomplete or wrong. Exclude it for the time
being; it can always be added if/when more information becomes available.

Tested with the following Ryzen CPUs:
    1300X A user with this CPU in the system reported somewhat unexpected
          values for Vcore; it isn't entirely if at all clear why that is
          the case. Overall this does not warrant holding up the series.
    1600
    1800X
    2200G
    2400G
    3800X
    3900X
    3950X

v2: Added tested-by: tags as received.
    Don't display voltage and current information for Threadripper and EPYC.
    Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
    for Tdie on model 17h/18h CPUs.

Comments

Ken Moffat Jan. 19, 2020, 12:33 a.m. UTC | #1
On Sat, 18 Jan 2020 at 17:26, Guenter Roeck <linux@roeck-us.net> wrote:
>
> This patch series implements various improvements for the k10temp driver.
>
> Patch 1/5 introduces the use of bit operations.
>
> Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance.
>
> Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
>
> Patch 4/5 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
>
> Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
> It is inaccurate, misleading, and it just doesn't make sense to report
> wrong information.
>
> With all patches in place, output on Ryzen 3900X CPUs looks as follows
> (with the system under load).
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
>
> The voltage and current information is limited to Ryzen CPUs. Voltage
> and current reporting on Threadripper and EPYC CPUs is different, and the
> reported information is either incomplete or wrong. Exclude it for the time
> being; it can always be added if/when more information becomes available.
>
> Tested with the following Ryzen CPUs:
>     1300X A user with this CPU in the system reported somewhat unexpected
>           values for Vcore; it isn't entirely if at all clear why that is
>           the case. Overall this does not warrant holding up the series.

As the owner of that machine, very much agreed.

>     1600
>     1800X
>     2200G
>     2400G
>     3800X
>     3900X
>     3950X
>

I also had sensible results for v1 on 2500U and 3400G

> v2: Added tested-by: tags as received.
>     Don't display voltage and current information for Threadripper and EPYC.
>     Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
>     for Tdie on model 17h/18h CPUs.

For v2 on my 2500U, system idle and then under load -

--- k10temp-idle 2020-01-19 00:16:18.812002121 +0000
+++ k10temp-load 2020-01-19 00:22:05.595470877 +0000
@@ -1,15 +1,15 @@
 k10temp-pci-00c3
 Adapter: PCI adapter
-Vcore:        +0.98 V
+Vcore:        +1.15 V
 Vsoc:         +0.93 V
-Tdie:         +38.2°C
-Tctl:         +38.2°C
-Icore:       +10.39 A
-Isoc:         +6.49 A
+Tdie:         +76.2°C
+Tctl:         +76.2°C
+Icore:       +51.96 A
+Isoc:         +7.58 A

 amdgpu-pci-0300
 Adapter: PCI adapter
 vddgfx:           N/A
 vddnb:            N/A
-edge:         +38.0°C  (crit = +80.0°C, hyst =  +0.0°C)
+edge:         +76.0°C  (crit = +80.0°C, hyst =  +0.0°C)

I'll ony test v2 on the 3400G if you think the results would add something.

ĸen
Guenter Roeck Jan. 19, 2020, 12:48 a.m. UTC | #2
On 1/18/20 4:33 PM, Ken Moffat wrote:
> On Sat, 18 Jan 2020 at 17:26, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> This patch series implements various improvements for the k10temp driver.
>>
>> Patch 1/5 introduces the use of bit operations.
>>
>> Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
>> API. This not only simplifies the code and reduces its size, it also
>> makes the code easier to maintain and enhance.
>>
>> Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
>> on Ryzen 3 (Zen2) CPUs.
>>
>> Patch 4/5 adds support for reporting core and SoC current and voltage
>> information on Ryzen CPUs.
>>
>> Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
>> It is inaccurate, misleading, and it just doesn't make sense to report
>> wrong information.
>>
>> With all patches in place, output on Ryzen 3900X CPUs looks as follows
>> (with the system under load).
>>
>> k10temp-pci-00c3
>> Adapter: PCI adapter
>> Vcore:        +1.36 V
>> Vsoc:         +1.18 V
>> Tdie:         +86.8°C
>> Tctl:         +86.8°C
>> Tccd1:        +80.0°C
>> Tccd2:        +81.8°C
>> Icore:       +44.14 A
>> Isoc:        +13.83 A
>>
>> The voltage and current information is limited to Ryzen CPUs. Voltage
>> and current reporting on Threadripper and EPYC CPUs is different, and the
>> reported information is either incomplete or wrong. Exclude it for the time
>> being; it can always be added if/when more information becomes available.
>>
>> Tested with the following Ryzen CPUs:
>>      1300X A user with this CPU in the system reported somewhat unexpected
>>            values for Vcore; it isn't entirely if at all clear why that is
>>            the case. Overall this does not warrant holding up the series.
> 
> As the owner of that machine, very much agreed.
>  >>      1600
>>      1800X
>>      2200G
>>      2400G
>>      3800X
>>      3900X
>>      3950X
>>
> 
> I also had sensible results for v1 on 2500U and 3400G
> 
Sorry, I somehow missed that.

>> v2: Added tested-by: tags as received.
>>      Don't display voltage and current information for Threadripper and EPYC.
>>      Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
>>      for Tdie on model 17h/18h CPUs.
> 
> For v2 on my 2500U, system idle and then under load -
> 
> --- k10temp-idle 2020-01-19 00:16:18.812002121 +0000
> +++ k10temp-load 2020-01-19 00:22:05.595470877 +0000
> @@ -1,15 +1,15 @@
>   k10temp-pci-00c3
>   Adapter: PCI adapter
> -Vcore:        +0.98 V
> +Vcore:        +1.15 V
>   Vsoc:         +0.93 V
> -Tdie:         +38.2°C
> -Tctl:         +38.2°C
> -Icore:       +10.39 A
> -Isoc:         +6.49 A
> +Tdie:         +76.2°C
> +Tctl:         +76.2°C
> +Icore:       +51.96 A
> +Isoc:         +7.58 A
> 
>   amdgpu-pci-0300
>   Adapter: PCI adapter
>   vddgfx:           N/A
>   vddnb:            N/A
> -edge:         +38.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> +edge:         +76.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> 
> I'll ony test v2 on the 3400G if you think the results would add something.
> 

Thanks a lot for the additional testing! I don't think we need another
test on 3400G; after all, the actual measurement code didn't change.

Everyone: I'll be happy to add Tested-by: tags with your name and e-mail
address to the series, but you'll have to send it to me. I appreciate
all your testing and would like to acknowledge it, but I can not add
Tested-by: tags (or any other tags, for that matter) on my own.

Thanks,
Guenter
Ken Moffat Jan. 19, 2020, 1:08 a.m. UTC | #3
On Sun, 19 Jan 2020 at 00:49, Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 1/18/20 4:33 PM, Ken Moffat wrote:
> > On Sat, 18 Jan 2020 at 17:26, Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> This patch series implements various improvements for the k10temp driver.
> >>
> >> Patch 1/5 introduces the use of bit operations.
> >>
> >> Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
> >> API. This not only simplifies the code and reduces its size, it also
> >> makes the code easier to maintain and enhance.
> >>
> >> Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
> >> on Ryzen 3 (Zen2) CPUs.
> >>
> >> Patch 4/5 adds support for reporting core and SoC current and voltage
> >> information on Ryzen CPUs.
> >>
> >> Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
> >> It is inaccurate, misleading, and it just doesn't make sense to report
> >> wrong information.
> >>
> >> With all patches in place, output on Ryzen 3900X CPUs looks as follows
> >> (with the system under load).
> >>
> >> k10temp-pci-00c3
> >> Adapter: PCI adapter
> >> Vcore:        +1.36 V
> >> Vsoc:         +1.18 V
> >> Tdie:         +86.8°C
> >> Tctl:         +86.8°C
> >> Tccd1:        +80.0°C
> >> Tccd2:        +81.8°C
> >> Icore:       +44.14 A
> >> Isoc:        +13.83 A
> >>
> >> The voltage and current information is limited to Ryzen CPUs. Voltage
> >> and current reporting on Threadripper and EPYC CPUs is different, and the
> >> reported information is either incomplete or wrong. Exclude it for the time
> >> being; it can always be added if/when more information becomes available.
> >>
> >> Tested with the following Ryzen CPUs:
> >>      1300X A user with this CPU in the system reported somewhat unexpected
> >>            values for Vcore; it isn't entirely if at all clear why that is
> >>            the case. Overall this does not warrant holding up the series.
> >
> > As the owner of that machine, very much agreed.
> >  >>      1600
> >>      1800X
> >>      2200G
> >>      2400G
> >>      3800X
> >>      3900X
> >>      3950X
> >>
> >
> > I also had sensible results for v1 on 2500U and 3400G
> >
> Sorry, I somehow missed that.
>
> >> v2: Added tested-by: tags as received.
> >>      Don't display voltage and current information for Threadripper and EPYC.
> >>      Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
> >>      for Tdie on model 17h/18h CPUs.
> >
> > For v2 on my 2500U, system idle and then under load -
> >
> > --- k10temp-idle 2020-01-19 00:16:18.812002121 +0000
> > +++ k10temp-load 2020-01-19 00:22:05.595470877 +0000
> > @@ -1,15 +1,15 @@
> >   k10temp-pci-00c3
> >   Adapter: PCI adapter
> > -Vcore:        +0.98 V
> > +Vcore:        +1.15 V
> >   Vsoc:         +0.93 V
> > -Tdie:         +38.2°C
> > -Tctl:         +38.2°C
> > -Icore:       +10.39 A
> > -Isoc:         +6.49 A
> > +Tdie:         +76.2°C
> > +Tctl:         +76.2°C
> > +Icore:       +51.96 A
> > +Isoc:         +7.58 A
> >
> >   amdgpu-pci-0300
> >   Adapter: PCI adapter
> >   vddgfx:           N/A
> >   vddnb:            N/A
> > -edge:         +38.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> > +edge:         +76.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> >
> > I'll ony test v2 on the 3400G if you think the results would add something.
> >
>
> Thanks a lot for the additional testing! I don't think we need another
> test on 3400G; after all, the actual measurement code didn't change.
>
> Everyone: I'll be happy to add Tested-by: tags with your name and e-mail
> address to the series, but you'll have to send it to me. I appreciate
> all your testing and would like to acknowledge it, but I can not add
> Tested-by: tags (or any other tags, for that matter) on my own.
>
> Thanks,
> Guenter

For the little it is worth:
Tested-by Ken Moffat <zarniwhoop73@googlemail.com>
Brad Campbell Jan. 19, 2020, 3:13 a.m. UTC | #4
On 19/1/20 8:48 am, Guenter Roeck wrote:
> Everyone: I'll be happy to add Tested-by: tags with your name and e-mail
> address to the series, but you'll have to send it to me. I appreciate
> all your testing and would like to acknowledge it, but I can not add
> Tested-by: tags (or any other tags, for that matter) on my own.
> 
> Thanks,
> Guenter
> 

Tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Jonathan McDowell Jan. 19, 2020, 10:18 a.m. UTC | #5
In article <20200118172615.26329-1-linux@roeck-us.net> (earth.lists.linux-kernel) you wrote:
> This patch series implements various improvements for the k10temp driver.
...
> The voltage and current information is limited to Ryzen CPUs. Voltage
> and current reporting on Threadripper and EPYC CPUs is different, and the
> reported information is either incomplete or wrong. Exclude it for the time
> being; it can always be added if/when more information becomes available.

> Tested with the following Ryzen CPUs:

Tested-By: Jonathan McDowell <noodles@earth.li>

Tested on a Ryzen 7 2700 (patched on top of 5.4.13):

| k10temp-pci-00c3
| Adapter: PCI adapter
| Vcore:        +0.80 V
| Vsoc:         +0.81 V
| Tdie:         +37.0°C
| Tctl:         +37.0°C
| Icore:        +8.31 A
| Isoc:         +6.86 A

Like the 1300X case I see a discrepancy compared to what the nct6779
driver says Vcore is:

| nct6779-isa-0290
| Adapter: ISA adapter
| Vcore:                  +0.33 V  (min =  +0.00 V, max =  +1.74 V)
| in1:                    +0.32 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| AVCC:                   +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| +3.3V:                  +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in4:                    +1.88 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in5:                    +0.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in6:                    +0.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| 3VSB:                   +3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| Vbat:                   +3.25 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in9:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
| in10:                   +0.22 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in11:                   +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in12:                   +1.70 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in13:                   +1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| in14:                   +1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
| fan1:                     0 RPM  (min =    0 RPM)
| fan2:                  1708 RPM  (min =    0 RPM)
| fan3:                     0 RPM  (min =    0 RPM)
| fan4:                     0 RPM  (min =    0 RPM)
| fan5:                     0 RPM  (min =    0 RPM)
| SYSTIN:                 +33.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM
| sensor = thermistor
| CPUTIN:                 -62.5°C  (high = +80.0°C, hyst = +75.0°C)
| sensor = thermistor
| AUXTIN0:                +79.0°C    sensor = thermistor
| AUXTIN1:                +96.0°C    sensor = thermistor
| AUXTIN2:                +23.0°C    sensor = thermistor
| AUXTIN3:                -22.0°C    sensor = thermistor
| SMBUSMASTER 0:          +39.0°C
| PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
| PCH_CHIP_TEMP:           +0.0°C
| PCH_CPU_TEMP:            +0.0°C
| intrusion0:            ALARM
| intrusion1:            ALARM
| beep_enable:           disabled

I suspect the nct6779 is not reporting correctly (or needs some
configuration) here, as I see that's what Ken is using with his 1300X as
well.

(ASRock B450M Pro4 motherboard, fwiw.)

J.
Holger Kiehl Jan. 19, 2020, 1:38 p.m. UTC | #6
On Sat, 18 Jan 2020, Guenter Roeck wrote:

> This patch series implements various improvements for the k10temp driver.
> 
> Patch 1/5 introduces the use of bit operations.
> 
> Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance. 
> 
> Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
> 
> Patch 4/5 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
> 
> Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
> It is inaccurate, misleading, and it just doesn't make sense to report
> wrong information.
> 
> With all patches in place, output on Ryzen 3900X CPUs looks as follows
> (with the system under load).
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
> 
> The voltage and current information is limited to Ryzen CPUs. Voltage
> and current reporting on Threadripper and EPYC CPUs is different, and the
> reported information is either incomplete or wrong. Exclude it for the time
> being; it can always be added if/when more information becomes available.
> 
> Tested with the following Ryzen CPUs:
>     1300X A user with this CPU in the system reported somewhat unexpected
>           values for Vcore; it isn't entirely if at all clear why that is
>           the case. Overall this does not warrant holding up the series.
>     1600
>     1800X
>     2200G
>     2400G
>     3800X
>     3900X
>     3950X
> 
> v2: Added tested-by: tags as received.
>     Don't display voltage and current information for Threadripper and EPYC.
>     Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
>     for Tdie on model 17h/18h CPUs.
> 
Just tested this on a 2400G. Here idle values:

   k10temp-pci-00c3
   Adapter: PCI adapter
   Vcore:        +0.77 V
   Vsoc:         +1.11 V
   Tdie:         +45.0°C
   Tctl:         +45.0°C
   Icore:       +10.39 A
   Isoc:         +2.89 A

   nvme-pci-0100
   Adapter: PCI adapter
   Composite:    +43.9°C  (low  = -273.1°C, high = +80.8°C)
                          (crit = +80.8°C)
   Sensor 1:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)
   Sensor 2:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)

   nct6793-isa-0290
   Adapter: ISA adapter
   in0:                    +0.35 V  (min =  +0.00 V, max =  +1.74 V)
   in1:                    +1.85 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in2:                    +3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in4:                    +0.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in6:                    +0.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in7:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in9:                    +1.83 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in10:                   +0.19 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in11:                   +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in12:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in13:                   +1.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in14:                   +0.21 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   fan1:                     0 RPM  (min =    0 RPM)
   fan2:                   323 RPM  (min =    0 RPM)
   fan3:                     0 RPM  (min =    0 RPM)
   fan4:                     0 RPM  (min =    0 RPM)
   fan5:                     0 RPM  (min =    0 RPM)
   SYSTIN:                +112.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
   CPUTIN:                 +60.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
   AUXTIN0:                +46.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
   AUXTIN1:               +106.0°C    sensor = thermistor
   AUXTIN2:               +105.0°C    sensor = thermistor
   AUXTIN3:               +102.0°C    sensor = thermistor
   SMBUSMASTER 0:          +45.0°C
   PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
   PCH_CHIP_TEMP:           +0.0°C
   PCH_CPU_TEMP:            +0.0°C
   intrusion0:            OK
   intrusion1:            ALARM
   beep_enable:           disabled

   amdgpu-pci-0300
   Adapter: PCI adapter
   vddgfx:           N/A
   vddnb:            N/A
   edge:         +45.0°C  (crit = +80.0°C, hyst =  +0.0°C)

And here with some high load:

   k10temp-pci-00c3
   Adapter: PCI adapter
   Vcore:        +1.32 V
   Vsoc:         +1.11 V
   Tdie:         +77.1°C
   Tctl:         +77.1°C
   Icore:       +85.22 A
   Isoc:         +3.61 A

   nvme-pci-0100
   Adapter: PCI adapter
   Composite:    +42.9°C  (low  = -273.1°C, high = +80.8°C)
                          (crit = +80.8°C)
   Sensor 1:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)
   Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)

   nct6793-isa-0290
   Adapter: ISA adapter
   in0:                    +0.68 V  (min =  +0.00 V, max =  +1.74 V)
   in1:                    +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in2:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in4:                    +0.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in6:                    +0.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in7:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in9:                    +1.83 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in10:                   +0.19 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in11:                   +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in12:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in13:                   +1.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in14:                   +0.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   fan1:                     0 RPM  (min =    0 RPM)
   fan2:                  1931 RPM  (min =    0 RPM)
   fan3:                     0 RPM  (min =    0 RPM)
   fan4:                     0 RPM  (min =    0 RPM)
   fan5:                     0 RPM  (min =    0 RPM)
   SYSTIN:                +113.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
   CPUTIN:                 +64.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
   AUXTIN0:                +45.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
   AUXTIN1:               +107.0°C    sensor = thermistor
   AUXTIN2:               +105.0°C    sensor = thermistor
   AUXTIN3:               +102.0°C    sensor = thermistor
   SMBUSMASTER 0:          +77.0°C
   PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
   PCH_CHIP_TEMP:           +0.0°C
   PCH_CPU_TEMP:            +0.0°C
   intrusion0:            OK
   intrusion1:            ALARM
   beep_enable:           disabled

   amdgpu-pci-0300
   Adapter: PCI adapter
   vddgfx:           N/A
   vddnb:            N/A
   edge:         +77.0°C  (crit = +80.0°C, hyst =  +0.0°C)

Have also tried this on a EPYC 7302. Before the patch:

   k10temp-pci-00c3
   Adapter: PCI adapter
   Tdie:         +28.1°C  (high = +70.0°C)
   Tctl:         +28.1°C 

and after:

   k10temp-pci-00c3
   Adapter: PCI adapter
   Tdie:         +28.2°C  
   Tctl:         +28.2°C

No extra values shown, but I think this is expected.

Tested-by Holger Kiehl <holger.kiehl@dwd.de>

Holger
Guenter Roeck Jan. 19, 2020, 3:46 p.m. UTC | #7
On 1/19/20 2:18 AM, Jonathan McDowell wrote:
> 
> In article <20200118172615.26329-1-linux@roeck-us.net> (earth.lists.linux-kernel) you wrote:
>> This patch series implements various improvements for the k10temp driver.
> ...
>> The voltage and current information is limited to Ryzen CPUs. Voltage
>> and current reporting on Threadripper and EPYC CPUs is different, and the
>> reported information is either incomplete or wrong. Exclude it for the time
>> being; it can always be added if/when more information becomes available.
> 
>> Tested with the following Ryzen CPUs:
> 
> Tested-By: Jonathan McDowell <noodles@earth.li>
> 
Thanks!

> Tested on a Ryzen 7 2700 (patched on top of 5.4.13):
> 
> | k10temp-pci-00c3
> | Adapter: PCI adapter
> | Vcore:        +0.80 V
> | Vsoc:         +0.81 V
> | Tdie:         +37.0°C
> | Tctl:         +37.0°C
> | Icore:        +8.31 A
> | Isoc:         +6.86 A
> 
> Like the 1300X case I see a discrepancy compared to what the nct6779
> driver says Vcore is:
> 
> | nct6779-isa-0290
> | Adapter: ISA adapter
> | Vcore:                  +0.33 V  (min =  +0.00 V, max =  +1.74 V)

I see that on all of my boards as well (3900X, different boards and board vendors),
with temperatures reported by the Super-IO chip sometimes as low as 0.18V (!).
Yet, there is a clear correlation of that voltage with CPU load.
I suspect the measurement by the Super-IO chip is a different voltage.

I don't think there is anything we can do about that without access to more
information.

> | in1:                    +0.32 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | AVCC:                   +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | +3.3V:                  +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in4:                    +1.88 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in5:                    +0.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in6:                    +0.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | 3VSB:                   +3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | Vbat:                   +3.25 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in9:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
> | in10:                   +0.22 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in11:                   +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in12:                   +1.70 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in13:                   +1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | in14:                   +1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> | fan1:                     0 RPM  (min =    0 RPM)
> | fan2:                  1708 RPM  (min =    0 RPM)
> | fan3:                     0 RPM  (min =    0 RPM)
> | fan4:                     0 RPM  (min =    0 RPM)
> | fan5:                     0 RPM  (min =    0 RPM)
> | SYSTIN:                 +33.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM
> | sensor = thermistor
> | CPUTIN:                 -62.5°C  (high = +80.0°C, hyst = +75.0°C)
> | sensor = thermistor
> | AUXTIN0:                +79.0°C    sensor = thermistor
> | AUXTIN1:                +96.0°C    sensor = thermistor
> | AUXTIN2:                +23.0°C    sensor = thermistor
> | AUXTIN3:                -22.0°C    sensor = thermistor
> | SMBUSMASTER 0:          +39.0°C
> | PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
> | PCH_CHIP_TEMP:           +0.0°C
> | PCH_CPU_TEMP:            +0.0°C
> | intrusion0:            ALARM
> | intrusion1:            ALARM
> | beep_enable:           disabled
> 
> I suspect the nct6779 is not reporting correctly (or needs some
> configuration) here, as I see that's what Ken is using with his 1300X as
> well.
> 
Initially I thought the voltage reported by the Super-IO chip would help
us understand what is going on, but that is not really the case.

The problem with Ken's board is that idle current and voltage are very high.
The idle voltage claims to be higher than the voltage under load, which
doesn't really make sense. This is only reflected in the voltage and current
reported by the CPU, but not by the voltage reported by the Super-IO chip.

Thanks,
Guenter
Guenter Roeck Jan. 19, 2020, 3:49 p.m. UTC | #8
On 1/19/20 5:38 AM, Holger Kiehl wrote:
> On Sat, 18 Jan 2020, Guenter Roeck wrote:
> 
>> This patch series implements various improvements for the k10temp driver.
>>
>> Patch 1/5 introduces the use of bit operations.
>>
>> Patch 2/5 converts the driver to use the devm_hwmon_device_register_with_info
>> API. This not only simplifies the code and reduces its size, it also
>> makes the code easier to maintain and enhance.
>>
>> Patch 3/5 adds support for reporting Core Complex Die (CCD) temperatures
>> on Ryzen 3 (Zen2) CPUs.
>>
>> Patch 4/5 adds support for reporting core and SoC current and voltage
>> information on Ryzen CPUs.
>>
>> Patch 5/5 removes the maximum temperature from Tdie for Ryzen CPUs.
>> It is inaccurate, misleading, and it just doesn't make sense to report
>> wrong information.
>>
>> With all patches in place, output on Ryzen 3900X CPUs looks as follows
>> (with the system under load).
>>
>> k10temp-pci-00c3
>> Adapter: PCI adapter
>> Vcore:        +1.36 V
>> Vsoc:         +1.18 V
>> Tdie:         +86.8°C
>> Tctl:         +86.8°C
>> Tccd1:        +80.0°C
>> Tccd2:        +81.8°C
>> Icore:       +44.14 A
>> Isoc:        +13.83 A
>>
>> The voltage and current information is limited to Ryzen CPUs. Voltage
>> and current reporting on Threadripper and EPYC CPUs is different, and the
>> reported information is either incomplete or wrong. Exclude it for the time
>> being; it can always be added if/when more information becomes available.
>>
>> Tested with the following Ryzen CPUs:
>>      1300X A user with this CPU in the system reported somewhat unexpected
>>            values for Vcore; it isn't entirely if at all clear why that is
>>            the case. Overall this does not warrant holding up the series.
>>      1600
>>      1800X
>>      2200G
>>      2400G
>>      3800X
>>      3900X
>>      3950X
>>
>> v2: Added tested-by: tags as received.
>>      Don't display voltage and current information for Threadripper and EPYC.
>>      Stop displaying the fixed (and wrong) maximum temperature of 70 degrees C
>>      for Tdie on model 17h/18h CPUs.
>>
> Just tested this on a 2400G. Here idle values:
> 
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     Vcore:        +0.77 V
>     Vsoc:         +1.11 V
>     Tdie:         +45.0°C
>     Tctl:         +45.0°C
>     Icore:       +10.39 A
>     Isoc:         +2.89 A
> 
>     nvme-pci-0100
>     Adapter: PCI adapter
>     Composite:    +43.9°C  (low  = -273.1°C, high = +80.8°C)
>                            (crit = +80.8°C)
>     Sensor 1:     +43.9°C  (low  = -273.1°C, high = +65261.8°C)
>     Sensor 2:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)
> 
>     nct6793-isa-0290
>     Adapter: ISA adapter
>     in0:                    +0.35 V  (min =  +0.00 V, max =  +1.74 V)
>     in1:                    +1.85 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in2:                    +3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in4:                    +0.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in6:                    +0.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in7:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in9:                    +1.83 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in10:                   +0.19 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in11:                   +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in12:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in13:                   +1.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in14:                   +0.21 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     fan1:                     0 RPM  (min =    0 RPM)
>     fan2:                   323 RPM  (min =    0 RPM)
>     fan3:                     0 RPM  (min =    0 RPM)
>     fan4:                     0 RPM  (min =    0 RPM)
>     fan5:                     0 RPM  (min =    0 RPM)
>     SYSTIN:                +112.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
>     CPUTIN:                 +60.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
>     AUXTIN0:                +46.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
>     AUXTIN1:               +106.0°C    sensor = thermistor
>     AUXTIN2:               +105.0°C    sensor = thermistor
>     AUXTIN3:               +102.0°C    sensor = thermistor
>     SMBUSMASTER 0:          +45.0°C
>     PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
>     PCH_CHIP_TEMP:           +0.0°C
>     PCH_CPU_TEMP:            +0.0°C
>     intrusion0:            OK
>     intrusion1:            ALARM
>     beep_enable:           disabled
> 
>     amdgpu-pci-0300
>     Adapter: PCI adapter
>     vddgfx:           N/A
>     vddnb:            N/A
>     edge:         +45.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> 
> And here with some high load:
> 
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     Vcore:        +1.32 V
>     Vsoc:         +1.11 V
>     Tdie:         +77.1°C
>     Tctl:         +77.1°C
>     Icore:       +85.22 A
>     Isoc:         +3.61 A
> 
>     nvme-pci-0100
>     Adapter: PCI adapter
>     Composite:    +42.9°C  (low  = -273.1°C, high = +80.8°C)
>                            (crit = +80.8°C)
>     Sensor 1:     +42.9°C  (low  = -273.1°C, high = +65261.8°C)
>     Sensor 2:     +45.9°C  (low  = -273.1°C, high = +65261.8°C)
> 
>     nct6793-isa-0290
>     Adapter: ISA adapter
>     in0:                    +0.68 V  (min =  +0.00 V, max =  +1.74 V)
>     in1:                    +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in2:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in4:                    +0.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in6:                    +0.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in7:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in9:                    +1.83 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in10:                   +0.19 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in11:                   +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in12:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in13:                   +1.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     in14:                   +0.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
>     fan1:                     0 RPM  (min =    0 RPM)
>     fan2:                  1931 RPM  (min =    0 RPM)
>     fan3:                     0 RPM  (min =    0 RPM)
>     fan4:                     0 RPM  (min =    0 RPM)
>     fan5:                     0 RPM  (min =    0 RPM)
>     SYSTIN:                +113.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
>     CPUTIN:                 +64.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
>     AUXTIN0:                +45.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
>     AUXTIN1:               +107.0°C    sensor = thermistor
>     AUXTIN2:               +105.0°C    sensor = thermistor
>     AUXTIN3:               +102.0°C    sensor = thermistor
>     SMBUSMASTER 0:          +77.0°C
>     PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
>     PCH_CHIP_TEMP:           +0.0°C
>     PCH_CPU_TEMP:            +0.0°C
>     intrusion0:            OK
>     intrusion1:            ALARM
>     beep_enable:           disabled
> 
>     amdgpu-pci-0300
>     Adapter: PCI adapter
>     vddgfx:           N/A
>     vddnb:            N/A
>     edge:         +77.0°C  (crit = +80.0°C, hyst =  +0.0°C)
> 
> Have also tried this on a EPYC 7302. Before the patch:
> 
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     Tdie:         +28.1°C  (high = +70.0°C)
>     Tctl:         +28.1°C
> 
> and after:
> 
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     Tdie:         +28.2°C
>     Tctl:         +28.2°C
> 
> No extra values shown, but I think this is expected.
> 
Unfortunately yes, but it helps to confirm that the detection works.

> Tested-by Holger Kiehl <holger.kiehl@dwd.de>
> 

Thanks again!

Guenter
Jonathan McDowell Jan. 19, 2020, 7:38 p.m. UTC | #9
On Sun, Jan 19, 2020 at 07:46:11AM -0800, Guenter Roeck wrote:
> On 1/19/20 2:18 AM, Jonathan McDowell wrote:
> > 
> > In article <20200118172615.26329-1-linux@roeck-us.net> (earth.lists.linux-kernel) you wrote:
> > > This patch series implements various improvements for the k10temp driver.
> > ...
> > > The voltage and current information is limited to Ryzen CPUs. Voltage
> > > and current reporting on Threadripper and EPYC CPUs is different, and the
> > > reported information is either incomplete or wrong. Exclude it for the time
> > > being; it can always be added if/when more information becomes available.
> > 
> > > Tested with the following Ryzen CPUs:
> > 
> > Tested-By: Jonathan McDowell <noodles@earth.li>
> > 
> Thanks!
> 
> > Tested on a Ryzen 7 2700 (patched on top of 5.4.13):
> > 
> > | k10temp-pci-00c3
> > | Adapter: PCI adapter
> > | Vcore:        +0.80 V
> > | Vsoc:         +0.81 V
> > | Tdie:         +37.0°C
> > | Tctl:         +37.0°C
> > | Icore:        +8.31 A
> > | Isoc:         +6.86 A
> > 
> > Like the 1300X case I see a discrepancy compared to what the nct6779
> > driver says Vcore is:
> > 
> > | nct6779-isa-0290
> > | Adapter: ISA adapter
> > | Vcore:                  +0.33 V  (min =  +0.00 V, max =  +1.74 V)
> 
> I see that on all of my boards as well (3900X, different boards and board vendors),
> with temperatures reported by the Super-IO chip sometimes as low as 0.18V (!).
> Yet, there is a clear correlation of that voltage with CPU load.
> I suspect the measurement by the Super-IO chip is a different voltage.
> 
> I don't think there is anything we can do about that without access to more
> information.
...
> The problem with Ken's board is that idle current and voltage are very high.
> The idle voltage claims to be higher than the voltage under load, which
> doesn't really make sense. This is only reflected in the voltage and current
> reported by the CPU, but not by the voltage reported by the Super-IO chip.

I see clear correlation between load/Vcore/Icore/Tdie from your patched
k10temp driver which leads me to believe these numbers are valid for the
2700. Vsoc is fairly consistent and Isoc doesn't vary much either
(6.3-8.1A range over the past 8 hours).

J.