mbox series

[RFT,0/4] hwmon: k10temp driver improvements

Message ID 20200116141800.9828-1-linux@roeck-us.net (mailing list archive)
Headers show
Series hwmon: k10temp driver improvements | expand

Message

Guenter Roeck Jan. 16, 2020, 2:17 p.m. UTC
This patch series implements various improvements for the k10temp driver.

Patch 1/4 introduces the use of bit operations.

Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
API. This not only simplifies the code and reduces its size, it also
makes the code easier to maintain and enhance. 

Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
on Ryzen 3 (Zen2) CPUs.

Patch 4/4 adds support for reporting core and SoC current and voltage
information on Ryzen CPUs.

With all patches in place, output on Ryzen 3900 CPUs looks as follows
(with the system under load).

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.36 V
Vsoc:         +1.18 V
Tdie:         +86.8°C  (high = +70.0°C)
Tctl:         +86.8°C
Tccd1:        +80.0°C
Tccd2:        +81.8°C
Icore:       +44.14 A
Isoc:        +13.83 A

The patch series has only been tested with Ryzen 3900 CPUs. Further test
coverage will be necessary before the changes can be applied to the Linux
kernel.

Comments

Darren Salt Jan. 16, 2020, 8:55 p.m. UTC | #1
Tested-By: Darren Salt <devspam@moreofthesa.me.uk>

Linux 5.4.12, Ryzen 5 1600. Patches were applied cleanly. No problems noticed in

    $ sensors k10temp-pci-00c3
    k10temp-pci-00c3
    Adapter: PCI adapter
    Vcore:	  +1.11 V
    Vsoc:	  +0.94 V
    Tdie:	  +42.8°C  (high = +70.0°C)
    Tctl:         +42.8°C
    Icore:       +15.59 A
    Isoc:    	 +12.63 A
    
    $
Guenter Roeck Jan. 16, 2020, 9:11 p.m. UTC | #2
On Thu, Jan 16, 2020 at 08:55:16PM +0000, Darren Salt wrote:
> Tested-By: Darren Salt <devspam@moreofthesa.me.uk>
> 
Thanks!

Guenter

> Linux 5.4.12, Ryzen 5 1600. Patches were applied cleanly. No problems noticed in
> 
>     $ sensors k10temp-pci-00c3
>     k10temp-pci-00c3
>     Adapter: PCI adapter
>     Vcore:	  +1.11 V
>     Vsoc:	  +0.94 V
>     Tdie:	  +42.8°C  (high = +70.0°C)
>     Tctl:         +42.8°C
>     Icore:       +15.59 A
>     Isoc:    	 +12.63 A
>     
>     $
> 
> -- 
> |  _  | Darren Salt, using Debian GNU/Linux (and Android)
> | ( ) |
> |  X  | ASCII Ribbon campaign against HTML e-mail
> | / \ |
Bernhard Gebetsberger Jan. 16, 2020, 10:46 p.m. UTC | #3
Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>

Patches applied cleanly on top of 5.5-rc6, no issues using a Ryzen 3 2200G:
k10temp-pci-00c3
Adapter: PCI adapter
Vcore:         1.29 V 
Vsoc:          1.12 V 
Tdie:         +28.2°C  (high = +70.0°C)
Tctl:         +28.2°C 
Icore:        23.90 A 
Isoc:          6.49 A

- Bernhard
Guenter Roeck Jan. 16, 2020, 10:52 p.m. UTC | #4
On Thu, Jan 16, 2020 at 11:46:47PM +0100, Bernhard Gebetsberger wrote:
> Tested-by: Bernhard Gebetsberger <bernhard.gebetsberger@gmx.at>
> 
> Patches applied cleanly on top of 5.5-rc6, no issues using a Ryzen 3 2200G:
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:         1.29 V 
> Vsoc:          1.12 V 
> Tdie:         +28.2°C  (high = +70.0°C)
> Tctl:         +28.2°C 
> Icore:        23.90 A 
> Isoc:          6.49 A
> 

Thanks!

Guenter
Ken Moffat Jan. 17, 2020, 12:38 a.m. UTC | #5
On Thu, 16 Jan 2020 at 14:18, Guenter Roeck <linux@roeck-us.net> wrote:
>
> This patch series implements various improvements for the k10temp driver.
>
> Patch 1/4 introduces the use of bit operations.
>
> Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance.
>
> Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
>
> Patch 4/4 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
>

> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C  (high = +70.0°C)
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
>
> The patch series has only been tested with Ryzen 3900 CPUs. Further test
> coverage will be necessary before the changes can be applied to the Linux
> kernel.

I have some Zen1 and Zen1+ here.

My Ryzen 3 1300X, applied to 5.5.0-rc5

machine idle, I thought at first the temperature may be a bit low, so
I've added other reported temperatures.  I now think it is maybe ok.

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.41 V
Vsoc:         +0.89 V
Tdie:         +21.2°C  (high = +70.0°C)
Tctl:         +21.2°C
Icore:       +30.14 A
Isoc:         +8.66 A

SYSTIN:                 +29.0°C  (high =  +0.0°C, hyst =  +0.0°C)
ALARM  sensor = thermistor
CPUTIN:                 +25.5°C  (high = +80.0°C, hyst = +75.0°C)
sensor = thermistor
AUXTIN0:                 -1.5°C    sensor = thermistor
AUXTIN1:                +87.0°C    sensor = thermistor
AUXTIN2:                +23.0°C    sensor = thermistor
AUXTIN3:                -27.0°C    sensor = thermistor
SMBUSMASTER 0:          +20.5°C

After about 2 minutes of make -j8 on kernel, to load it

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.26 V
Vsoc:         +0.89 V
Tdie:         +46.2°C  (high = +70.0°C)
Tctl:         +46.2°C
Icore:       +45.73 A
Isoc:        +11.18 A

SYSTIN:                 +29.0°C  (high =  +0.0°C, hyst =  +0.0°C)
ALARM  sensor = thermistor
CPUTIN:                 +38.5°C  (high = +80.0°C, hyst = +75.0°C)
sensor = thermistor
AUXTIN0:                 -7.5°C    sensor = thermistor
AUXTIN1:                +85.0°C    sensor = thermistor
AUXTIN2:                +23.0°C    sensor = thermistor
AUXTIN3:                -27.0°C    sensor = thermistor
SMBUSMASTER 0:          +46.0°C

So I guess the temperatures *are* in the right area.
Interestingly, the Vcore restores to above +1.4V when idle.

And my Ryzen 5 3400G (Zen+), applied to 5.4.12, box is idle,
also showing the gpu measurements of this APU to confirm the
temperature:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +0.94 V
Vsoc:         +1.09 V
Tdie:         +34.8°C  (high = +70.0°C)
Tctl:         +34.8°C
Icore:        +6.24 A
Isoc:         +8.30 A

amdgpu-pci-0900
Adapter: PCI adapter
vddgfx:           N/A
vddnb:            N/A
edge:         +34.0°C  (crit = +80.0°C, hyst =  +0.0°C)

For my Ryzen 5 2500u laptop (Zen1), again showing the gpu:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +0.97 V
Vsoc:         +0.93 V
Tdie:         +37.2°C  (high = +70.0°C)
Tctl:         +37.2°C
Icore:       +19.75 A
Isoc:         +8.66 A

amdgpu-pci-0300
Adapter: PCI adapter
vddgfx:           N/A
vddnb:            N/A
edge:         +37.0°C  (crit = +80.0°C, hyst =  +0.0°C)

Thanks.
ĸen
Guenter Roeck Jan. 17, 2020, 3:58 a.m. UTC | #6
Hi Ken,

On 1/16/20 4:38 PM, Ken Moffat wrote:
> On Thu, 16 Jan 2020 at 14:18, Guenter Roeck <linux@roeck-us.net> wrote:
[ ... ]
> I have some Zen1 and Zen1+ here.
> 
> My Ryzen 3 1300X, applied to 5.5.0-rc5
> 
> machine idle, I thought at first the temperature may be a bit low, so
> I've added other reported temperatures.  I now think it is maybe ok.
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.41 V
> Vsoc:         +0.89 V
> Tdie:         +21.2°C  (high = +70.0°C)
> Tctl:         +21.2°C
> Icore:       +30.14 A
> Isoc:         +8.66 A
> 
> SYSTIN:                 +29.0°C  (high =  +0.0°C, hyst =  +0.0°C)
> ALARM  sensor = thermistor
> CPUTIN:                 +25.5°C  (high = +80.0°C, hyst = +75.0°C)
> sensor = thermistor
> AUXTIN0:                 -1.5°C    sensor = thermistor
> AUXTIN1:                +87.0°C    sensor = thermistor
> AUXTIN2:                +23.0°C    sensor = thermistor
> AUXTIN3:                -27.0°C    sensor = thermistor
> SMBUSMASTER 0:          +20.5°C
> 
SMBUSMASTER 0 is the CPU, so we have a match with the temperatures.

> After about 2 minutes of make -j8 on kernel, to load it
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.26 V
> Vsoc:         +0.89 V
> Tdie:         +46.2°C  (high = +70.0°C)
> Tctl:         +46.2°C
> Icore:       +45.73 A
> Isoc:        +11.18 A
> 

Both Vcore and Icore should be much less when idle, and higher under
load. The data from the Super-IO chip suggests that it is a Nuvoton
chip. Can you report its first voltage (in0) ? That should roughly
match Vcore.

> SYSTIN:                 +29.0°C  (high =  +0.0°C, hyst =  +0.0°C)
> ALARM  sensor = thermistor
> CPUTIN:                 +38.5°C  (high = +80.0°C, hyst = +75.0°C)
> sensor = thermistor
> AUXTIN0:                 -7.5°C    sensor = thermistor
> AUXTIN1:                +85.0°C    sensor = thermistor
> AUXTIN2:                +23.0°C    sensor = thermistor
> AUXTIN3:                -27.0°C    sensor = thermistor
> SMBUSMASTER 0:          +46.0°C
> 
> So I guess the temperatures *are* in the right area.
> Interestingly, the Vcore restores to above +1.4V when idle.
> 
It should be much lower when idle, actually, not higher.

All other data looks ok.

Thanks,
Guenter
Ken Moffat Jan. 17, 2020, 4:47 a.m. UTC | #7
On Fri, 17 Jan 2020 at 03:58, Guenter Roeck <linux@roeck-us.net> wrote:
>
> Hi Ken,
>
> SMBUSMASTER 0 is the CPU, so we have a match with the temperatures.
>
OK, thanks for that information.

>
> Both Vcore and Icore should be much less when idle, and higher under
> load. The data from the Super-IO chip suggests that it is a Nuvoton
> chip. Can you report its first voltage (in0) ? That should roughly
> match Vcore.
>
> All other data looks ok.
>
> Thanks,
> Guenter

Hi Guenter,

unfortunately I don't have any report of in0. I'm guessing I need some
module(s) which did not seem to do anything useful in the past.

All I have in the 'in' area is
nct6779-isa-0290
Adapter: ISA adapter
Vcore:                  +0.30 V  (min =  +0.00 V, max =  +1.74 V)
in1:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
AVCC:                   +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:                  +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                    +1.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                    +0.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                    +1.50 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:                   +3.47 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:                   +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
in10:                   +0.32 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                   +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                   +1.70 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                   +0.94 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM

and at that point Vcore was reported as 1.41V (system idle)

ĸen
Ondrej Čerman Jan. 17, 2020, 9:46 a.m. UTC | #8
Dňa 16. 1. 2020 o 15:17 Guenter Roeck napísal(a):
> This patch series implements various improvements for the k10temp driver.
>
> Patch 1/4 introduces the use of bit operations.
>
> Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance.
>
> Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
>
> Patch 4/4 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
>
> With all patches in place, output on Ryzen 3900 CPUs looks as follows
> (with the system under load).
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C  (high = +70.0°C)
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
>
> The patch series has only been tested with Ryzen 3900 CPUs. Further test
> coverage will be necessary before the changes can be applied to the Linux
> kernel.
>
Hello everyone, I am the author of https://github.com/ocerman/zenpower/ .

It is nice to see this merged.

I just want to warn you that there have been reported issues with 
Threadripper CPUs to zenpower issue tracker. Also I think that no-one 
tested EPYC CPUs.

Most of the stuff I was able to figure out by trial-and-error approach 
and unfortunately because I do not own any Threadripper CPU I was not 
able to test and fix reported problems.

Ondrej.
Holger Kiehl Jan. 17, 2020, 9:58 a.m. UTC | #9
On Thu, 16 Jan 2020, Guenter Roeck wrote:

> This patch series implements various improvements for the k10temp driver.
> 
> Patch 1/4 introduces the use of bit operations.
> 
> Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance. 
> 
> Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
> 
> Patch 4/4 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
> 
> With all patches in place, output on Ryzen 3900 CPUs looks as follows
> (with the system under load).
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C  (high = +70.0°C)
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
> 
> The patch series has only been tested with Ryzen 3900 CPUs. Further test
> coverage will be necessary before the changes can be applied to the Linux
> kernel.
> 
Here from my little Asrock A300 with a Ryzen 2400G:

   sensors
   k10temp-pci-00c3
   Adapter: PCI adapter
   Vcore:        +0.78 V  
   Vsoc:         +1.11 V  
   Tdie:         +44.8°C  (high = +70.0°C)
   Tctl:         +44.8°C  
   Icore:        +5.20 A  
   Isoc:         +2.17 A  

   nvme-pci-0100
   Adapter: PCI adapter
   Composite:    +41.9°C  (low  = -273.1°C, high = +80.8°C)
                          (crit = +80.8°C)
   Sensor 1:     +41.9°C  (low  = -273.1°C, high = +65261.8°C)
   Sensor 2:     +44.9°C  (low  = -273.1°C, high = +65261.8°C)

   nct6793-isa-0290
   Adapter: ISA adapter
   in0:                    +0.34 V  (min =  +0.00 V, max =  +1.74 V)
   in1:                    +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in2:                    +3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in4:                    +0.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in5:                    +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in6:                    +0.67 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in7:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in8:                    +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in9:                    +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in10:                   +0.19 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in11:                   +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in12:                   +1.85 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in13:                   +1.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   in14:                   +0.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
   fan1:                     0 RPM  (min =    0 RPM)
   fan2:                   317 RPM  (min =    0 RPM)
   fan3:                     0 RPM  (min =    0 RPM)
   fan4:                     0 RPM  (min =    0 RPM)
   fan5:                     0 RPM  (min =    0 RPM)
   SYSTIN:                +113.0°C  (high =  +0.0°C, hyst =  +0.0°C)  sensor = thermistor
   CPUTIN:                 +59.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
   AUXTIN0:                +45.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
   AUXTIN1:               +107.0°C    sensor = thermistor
   AUXTIN2:               +106.0°C    sensor = thermistor
   AUXTIN3:               +103.0°C    sensor = thermistor
   SMBUSMASTER 0:          +44.5°C  
   PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
   PCH_CHIP_TEMP:           +0.0°C  
   PCH_CPU_TEMP:            +0.0°C  
   intrusion0:            OK
   intrusion1:            ALARM
   beep_enable:           disabled

   amdgpu-pci-0300
   Adapter: PCI adapter
   vddgfx:           N/A  
   vddnb:            N/A  
   edge:         +44.0°C  (crit = +80.0°C, hyst =  +0.0°C)

Patches applied without any problem against Linus git tree.

Many thanks for this work!

Regards,
Holger
Guenter Roeck Jan. 17, 2020, 2:14 p.m. UTC | #10
On 1/16/20 8:47 PM, Ken Moffat wrote:
> On Fri, 17 Jan 2020 at 03:58, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> Hi Ken,
>>
>> SMBUSMASTER 0 is the CPU, so we have a match with the temperatures.
>>
> OK, thanks for that information.
> 
>>
>> Both Vcore and Icore should be much less when idle, and higher under
>> load. The data from the Super-IO chip suggests that it is a Nuvoton
>> chip. Can you report its first voltage (in0) ? That should roughly
>> match Vcore.
>>
>> All other data looks ok.
>>
>> Thanks,
>> Guenter
> 
> Hi Guenter,
> 
> unfortunately I don't have any report of in0. I'm guessing I need some
> module(s) which did not seem to do anything useful in the past.
> 
> All I have in the 'in' area is
> nct6779-isa-0290
> Adapter: ISA adapter
> Vcore:                  +0.30 V  (min =  +0.00 V, max =  +1.74 V)
> in1:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
> AVCC:                   +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> +3.3V:                  +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in4:                    +1.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in5:                    +0.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in6:                    +1.50 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> 3VSB:                   +3.47 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> Vbat:                   +3.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in9:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
> in10:                   +0.32 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in11:                   +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in12:                   +1.70 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in13:                   +0.94 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> in14:                   +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> 
> and at that point Vcore was reported as 1.41V (system idle)
> 

Looks like someone configured /etc/sensors3.conf on that system which tells it
to report in0 as Vcore. So there is a very clear mismatch. Can you report
the values seen when the system is under load ?

Thanks,
Guenter
Guenter Roeck Jan. 17, 2020, 6:46 p.m. UTC | #11
On Fri, Jan 17, 2020 at 10:46:25AM +0100, Ondrej Čerman wrote:
> Dňa 16. 1. 2020 o 15:17 Guenter Roeck napísal(a):
> > This patch series implements various improvements for the k10temp driver.
> > 
> > Patch 1/4 introduces the use of bit operations.
> > 
> > Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
> > API. This not only simplifies the code and reduces its size, it also
> > makes the code easier to maintain and enhance.
> > 
> > Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
> > on Ryzen 3 (Zen2) CPUs.
> > 
> > Patch 4/4 adds support for reporting core and SoC current and voltage
> > information on Ryzen CPUs.
> > 
> > With all patches in place, output on Ryzen 3900 CPUs looks as follows
> > (with the system under load).
> > 
> > k10temp-pci-00c3
> > Adapter: PCI adapter
> > Vcore:        +1.36 V
> > Vsoc:         +1.18 V
> > Tdie:         +86.8°C  (high = +70.0°C)
> > Tctl:         +86.8°C
> > Tccd1:        +80.0°C
> > Tccd2:        +81.8°C
> > Icore:       +44.14 A
> > Isoc:        +13.83 A
> > 
> > The patch series has only been tested with Ryzen 3900 CPUs. Further test
> > coverage will be necessary before the changes can be applied to the Linux
> > kernel.
> > 
> Hello everyone, I am the author of https://github.com/ocerman/zenpower/ .
> 
> It is nice to see this merged.
> 
> I just want to warn you that there have been reported issues with
> Threadripper CPUs to zenpower issue tracker. Also I think that no-one tested
> EPYC CPUs.
> 
> Most of the stuff I was able to figure out by trial-and-error approach and
> unfortunately because I do not own any Threadripper CPU I was not able to
> test and fix reported problems.
> 
Thanks a lot for the note. The key problem seems to be that Threadripper
doesn't report SoC current and voltage. Is that correct ? If so, that
should be easy to solve.

On a side note, drivers/gpu/drm/amd/include/asic_reg/thm/thm_10_0_offset.h
suggests that two more temperature sensors might be available at 0x0005995C
and 0x00059960 (DIE3_TEMP and SW_TEMP). Have you ever tried that ?

Thanks,
Guenter
Ken Moffat Jan. 17, 2020, 6:58 p.m. UTC | #12
On Fri, 17 Jan 2020 at 14:14, Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 1/16/20 8:47 PM, Ken Moffat wrote:
> > unfortunately I don't have any report of in0. I'm guessing I need some
> > module(s) which did not seem to do anything useful in the past.
> >
> > All I have in the 'in' area is
> > nct6779-isa-0290
> > Adapter: ISA adapter
> > Vcore:                  +0.30 V  (min =  +0.00 V, max =  +1.74 V)
> > in1:                    +0.00 V  (min =  +0.00 V, max =  +0.00 V)
> > AVCC:                   +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
> > +3.3V:                  +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM

>
> Looks like someone configured /etc/sensors3.conf on that system which tells it
> to report in0 as Vcore. So there is a very clear mismatch. Can you report
> the values seen when the system is under load ?
>
> Thanks,
> Guenter

I do have sensors3.conf from lm_sensors-3.4.0. Here are the figures
 under load.

Vcore:                  +0.65 V  (min =  +0.00 V, max =  +1.74 V)

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.27 V
Vsoc:         +0.89 V
Tdie:         +46.2°C  (high = +70.0°C)
Tctl:         +46.2°C
Icore:       +48.84 A
Isoc:        +10.10 A

ĸen
Sebastian Reichel Jan. 17, 2020, 7:15 p.m. UTC | #13
Hi,

On Thu, Jan 16, 2020 at 06:17:56AM -0800, Guenter Roeck wrote:
> This patch series implements various improvements for the k10temp driver.
> 
> Patch 1/4 introduces the use of bit operations.
> 
> Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
> API. This not only simplifies the code and reduces its size, it also
> makes the code easier to maintain and enhance. 
> 
> Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
> on Ryzen 3 (Zen2) CPUs.
> 
> Patch 4/4 adds support for reporting core and SoC current and voltage
> information on Ryzen CPUs.
> 
> With all patches in place, output on Ryzen 3900 CPUs looks as follows
> (with the system under load).
> 
> k10temp-pci-00c3
> Adapter: PCI adapter
> Vcore:        +1.36 V
> Vsoc:         +1.18 V
> Tdie:         +86.8°C  (high = +70.0°C)
> Tctl:         +86.8°C
> Tccd1:        +80.0°C
> Tccd2:        +81.8°C
> Icore:       +44.14 A
> Isoc:        +13.83 A
> 
> The patch series has only been tested with Ryzen 3900 CPUs. Further test
> coverage will be necessary before the changes can be applied to the Linux
> kernel.

Looks ok on 3800X (idle):

$ lscpu | grep "Model name"
Model name:                      AMD Ryzen 7 3800X 8-Core Processor
$ sensors "k10temp-*"
k10temp-pci-00c3
Adapter: PCI adapter
Vcore:       937.00 mV 
Vsoc:          1.01 V  
Tdie:         +35.2°C  (high = +70.0°C)
Tctl:         +35.2°C  
Tccd1:        +35.8°C  
Icore:         4.61 A  
Isoc:          6.18 A  

And after compiling the kernel with 32 threads for 1 minute:

$ sensors "k10temp-*" 
k10temp-pci-00c3
Adapter: PCI adapter
Vcore:         1.29 V  
Vsoc:          1.01 V  
Tdie:         +77.1°C  (high = +70.0°C)
Tctl:         +77.1°C  
Tccd1:        +78.8°C  
Icore:        39.53 A  
Isoc:          6.18 A  

Board Information during the idle check:

$ sudo dmidecode -s system-manufacturer
Gigabyte Technology Co., Ltd.
$ sudo dmidecode -s system-product-name
X570 AORUS ULTRA
$ sensors "it8792-*"
it8792-isa-0a60
Adapter: ISA adapter
in0:           1.79 V  (min =  +0.00 V, max =  +2.78 V)
in1:         589.00 mV (min =  +0.00 V, max =  +2.78 V)
in2:         981.00 mV (min =  +0.00 V, max =  +2.78 V)
+3.3V:         1.68 V  (min =  +0.00 V, max =  +2.78 V)
in4:           1.79 V  (min =  +0.00 V, max =  +2.78 V)
in5:           1.18 V  (min =  +0.00 V, max =  +2.78 V)
in6:           2.78 V  (min =  +0.00 V, max =  +2.78 V)  ALARM
3VSB:          1.68 V  (min =  +0.00 V, max =  +2.78 V)
Vbat:          1.61 V  
fan1:           0 RPM  (min =    0 RPM)
fan2:           0 RPM  (min =    0 RPM)
fan3:           0 RPM  (min =    0 RPM)
temp1:        +37.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp3:        +36.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
intrusion0:  ALARM

-- Sebastian
Ondrej Čerman Jan. 17, 2020, 10:48 p.m. UTC | #14
Dňa 17. 1. 2020 o 19:46 Guenter Roeck napísal(a):
> On Fri, Jan 17, 2020 at 10:46:25AM +0100, Ondrej Čerman wrote:
>> Dňa 16. 1. 2020 o 15:17 Guenter Roeck napísal(a):
>>> This patch series implements various improvements for the k10temp driver.
>>>
>>> Patch 1/4 introduces the use of bit operations.
>>>
>>> Patch 2/4 converts the driver to use the devm_hwmon_device_register_with_info
>>> API. This not only simplifies the code and reduces its size, it also
>>> makes the code easier to maintain and enhance.
>>>
>>> Patch 3/4 adds support for reporting Core Complex Die (CCD) temperatures
>>> on Ryzen 3 (Zen2) CPUs.
>>>
>>> Patch 4/4 adds support for reporting core and SoC current and voltage
>>> information on Ryzen CPUs.
>>>
>>> With all patches in place, output on Ryzen 3900 CPUs looks as follows
>>> (with the system under load).
>>>
>>> k10temp-pci-00c3
>>> Adapter: PCI adapter
>>> Vcore:        +1.36 V
>>> Vsoc:         +1.18 V
>>> Tdie:         +86.8°C  (high = +70.0°C)
>>> Tctl:         +86.8°C
>>> Tccd1:        +80.0°C
>>> Tccd2:        +81.8°C
>>> Icore:       +44.14 A
>>> Isoc:        +13.83 A
>>>
>>> The patch series has only been tested with Ryzen 3900 CPUs. Further test
>>> coverage will be necessary before the changes can be applied to the Linux
>>> kernel.
>>>
>> Hello everyone, I am the author of https://github.com/ocerman/zenpower/ .
>>
>> It is nice to see this merged.
>>
>> I just want to warn you that there have been reported issues with
>> Threadripper CPUs to zenpower issue tracker. Also I think that no-one tested
>> EPYC CPUs.
>>
>> Most of the stuff I was able to figure out by trial-and-error approach and
>> unfortunately because I do not own any Threadripper CPU I was not able to
>> test and fix reported problems.
>>
> Thanks a lot for the note. The key problem seems to be that Threadripper
> doesn't report SoC current and voltage. Is that correct ? If so, that
> should be easy to solve.

Hello,

I thought that initially, but I was wrong. It seems like that these 
multi-node CPUs are reporting SOC and Core voltage/current data at 
particular node. Look at this HWiNFO64 screenshot of 2990WX for 
reference: https://i.imgur.com/yM9X5nd.jpg . They also may be using 
different addresses and/or factors.

> On a side note, drivers/gpu/drm/amd/include/asic_reg/thm/thm_10_0_offset.h
> suggests that two more temperature sensors might be available at 0x0005995C
> and 0x00059960 (DIE3_TEMP and SW_TEMP). Have you ever tried that ?
>
> Thanks,
> Guenter

I was aware of 0005995c and I thought that it could be Tdie3 (that's why 
I have included it in debug output, someone already shared that 3960X is 
reporting data on that address). I think this one can be safely included.

I was not aware of the other address, I will try it.

Ondrej.
Brad Campbell Jan. 18, 2020, 8:52 a.m. UTC | #15
On 16/1/20 10:17 pm, Guenter Roeck wrote:
> This patch series implements various improvements for the k10temp driver.
> 

Looks good here. Identical motherboards (ASUS x370 Prime-Pro), different 
CPUs.

3950x

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.38 V
Vsoc:         +1.08 V
Tdie:         +69.1°C  (high = +70.0°C)
Tctl:         +69.1°C
Tccd1:        +54.2°C
Tccd2:        +57.0°C
Icore:       +27.67 A
Isoc:        +14.13 A

it8665-isa-0290
Adapter: ISA adapter
Vcore:        +1.41 V  (min =  +0.83 V, max =  +1.65 V)
in1:          +2.51 V  (min =  +1.98 V, max =  +2.73 V)
+12V:        +11.98 V  (min = +11.20 V, max = +12.40 V)
+5V:          +5.01 V  (min =  +4.74 V, max =  +5.61 V)
3VSB:         +6.67 V  (min =  +2.83 V, max =  +3.40 V)
Vbat:         +6.58 V
+3.3V:        +3.33 V
CPU Fan:     3409 RPM  (min = 1500 RPM)
Back Fan:       0 RPM  (min =    0 RPM)
MB CPU Temp:  +56.0°C  (low  = +13.0°C, high = +88.0°C)
Ambient:      +35.0°C  (low  = +13.0°C, high = +43.0°C)  sensor = thermistor
PCH:          +46.0°C  (low  = +18.0°C, high = +61.0°C)  sensor = thermistor

1800x

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:        +1.26 V
Vsoc:         +0.91 V
Tdie:         +36.0°C  (high = +70.0°C)
Tctl:         +56.0°C
Icore:       +15.59 A
Isoc:         +7.94 A

it8665-isa-0290
Adapter: ISA adapter
Vcore:        +1.25 V  (min =  +0.83 V, max =  +1.65 V)
in1:          +2.48 V  (min =  +1.98 V, max =  +2.73 V)
+12V:        +11.98 V  (min = +11.20 V, max = +12.40 V)
+5V:          +4.96 V  (min =  +4.74 V, max =  +5.61 V)
3VSB:         +6.54 V  (min =  +2.83 V, max =  +3.40 V)
Vbat:         +6.37 V
+3.3V:        +3.31 V
CPU Fan:     1171 RPM  (min = 1500 RPM)  ALARM
Back Fan:       0 RPM  (min =    0 RPM)
MB CPU Temp:  +36.0°C  (low  = +13.0°C, high = +88.0°C)
Ambient:      +44.0°C  (low  = +13.0°C, high = +43.0°C)  sensor = thermistor
PCH:          +38.0°C  (low  = +18.0°C, high = +61.0°C)  sensor = thermistor

Regards,
Brad
Guenter Roeck Jan. 18, 2020, 5:14 p.m. UTC | #16
On 1/18/20 12:52 AM, Brad Campbell wrote:
> On 16/1/20 10:17 pm, Guenter Roeck wrote:
>> This patch series implements various improvements for the k10temp driver.
>>
> 
> Looks good here. Identical motherboards (ASUS x370 Prime-Pro), different CPUs.
> 
> 3950x
> 
Interesting. I thought the 3950X needs a newer motherboard. Is that CPU as amazing
as everyone says it is ? And does it really need liquid cooling ?

Anyway, thanks a lot for testing!

Guenter
Brad Campbell Jan. 19, 2020, 1:59 a.m. UTC | #17
On 19/1/20 1:14 am, Guenter Roeck wrote:
> On 1/18/20 12:52 AM, Brad Campbell wrote:
>> On 16/1/20 10:17 pm, Guenter Roeck wrote:
>>> This patch series implements various improvements for the k10temp 
>>> driver.
>>>
>>
>> Looks good here. Identical motherboards (ASUS x370 Prime-Pro), 
>> different CPUs.
>>
>> 3950x
>>
> Interesting. I thought the 3950X needs a newer motherboard. 

This board has the 3950x listed on the compatibility matrix, so I took 
the punt. I am running a beta BIOS with the 1.0.0.4 AGESA but it's 
running in a stock production environment and has been stable. I don't 
overclock or game, it's predominantly a VM host.

> Is that CPU as amazing as everyone says it is ? 

It is fairly impressive and a significant update over the 1800x it 
replaced. Kernel compiles are pretty quick :)

> And does it really need liquid cooling ?

No. I'm using a stock AMD Wraith Prism cooler in a 4U rack case.

It might reach higher boost clocks with better cooling, but under an 
all-core load I still see > 4.1GHz across all cores.

Regards,
Brad