diff mbox series

[2/3] hwmon: (amd_energy) Add documentation

Message ID 20200501175003.28613-2-nchatrad@amd.com (mailing list archive)
State Changes Requested
Headers show
Series [1/3] hwmon: Add amd_energy driver to report energy counters | expand

Commit Message

Naveen Krishna Chatradhi May 1, 2020, 5:50 p.m. UTC
Document amd_energy driver with all chips supported by it.

Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
---
Changes in v5: None

 Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++
 Documentation/hwmon/index.rst      |   1 +
 2 files changed, 101 insertions(+)
 create mode 100644 Documentation/hwmon/amd_energy.rst

Comments

Guenter Roeck May 6, 2020, 4:33 p.m. UTC | #1
On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote:
> Document amd_energy driver with all chips supported by it.
> 
> Cc: Guenter Roeck <linux@roeck-us.net>
> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
> ---
> Changes in v5: None
> 
>  Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++
>  Documentation/hwmon/index.rst      |   1 +
>  2 files changed, 101 insertions(+)
>  create mode 100644 Documentation/hwmon/amd_energy.rst
> 
> diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
> new file mode 100644
> index 000000000000..2216c8b13e58
> --- /dev/null
> +++ b/Documentation/hwmon/amd_energy.rst
> @@ -0,0 +1,100 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Kernel driver amd_energy
> +==========================
> +
> +Supported chips:
> +
> +* AMD Family 17h Processors
> +
> +  Prefix: 'amd_energy'
> +
> +  Addresses used:  RAPL MSRs
> +
> +  Datasheets:
> +
> +  - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
> +
> +	https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
> +
> +  - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
> +
> +	https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
> +
> +Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
> +
> +Description
> +-----------
> +
> +The Energy driver exposes the energy counters that are
> +reported via the Running Average Power Limit (RAPL)
> +Model-specific Registers (MSRs) via the hardware monitor
> +(HWMON) sysfs interface.
> +
> +1. Power, Energy and Time Units
> +   MSR_RAPL_POWER_UNIT/ C001_0299:
> +   shared with all cores in the socket
> +
> +2. Energy consumed by each Core
> +   MSR_CORE_ENERGY_STATUS/ C001_029A:
> +   32-bitRO, Accumulator, core-level power reporting
> +
> +3. Energy consumed by Socket
> +   MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
> +   32-bitRO, Accumulator, socket-level power reporting,
> +   shared with all cores in socket
> +
> +These registers are updated every 1ms and cleared on
> +reset of the system.
> +
> +Energy Caluclation
> +------------------
> +
> +Energy information (in Joules) is based on the multiplier,
> +1/2^ESU; where ESU is an unsigned integer read from
> +MSR_RAPL_POWER_UNIT register. Default value is 10000b,
> +indicating energy status unit is 15.3 micro-Joules increment.
> +
> +Reported values are scaled as per the formula
> +
> +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules
> +
> +Users calculate power for a given domain by calculating
> +	dEnergy/dTime for that domain.
> +
> +Socket energy accumulation
> +--------------------------
> +
> +Current Socket energy status register is 32bit, assuming a 240W
> +system, the register would wrap around in
> +
> +	2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins)
> +
> +To improve the wrap around time, a kernel thread is implemented
> +to accumulate the socket energy counter to a 64-bit counter. The
> +kernel thread starts running during probe, wakes up at 100secs

wakes up every 100 seconds

> +and stops running in remove.

stops running when the driver is removed.

All counters need to be be updated by the kernel thread, not just the socket
counter. If the socket counter can wrap in 4.5 minutes, the matching per-core
counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which
isn't much better. This might be even worse on a system with fewer cores and
higher per-core power.

> +
> +A socket energy read would return the current register value
> +added to the respective energy accumulator.
> +
> +Sysfs attributes
> +----------------
> +
> +=============== ========  =====================================
> +Attribute	Label	  Description
> +===============	========  =====================================
> +
> +* For index N between [1] and [nr_cpus]
> +
> +===============	========  ======================================
> +energy[N]_input EcoreX	  Core Energy   X = [0] to [nr_cpus - 1]
> +			  Measured input core energy
> +===============	========  ======================================
> +
> +* For N between [nr_cpus] and [nr_cpus + nr_socks]
> +
> +===============	========  ======================================
> +energy[N]_input EsocketX  Socket Energy X = [0] to [nr_socks -1]
> +			  Measured input socket energy
> +=============== ========  ======================================
> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> index 8ef62fd39787..fc4b89810e67 100644
> --- a/Documentation/hwmon/index.rst
> +++ b/Documentation/hwmon/index.rst
> @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers
>     adt7470
>     adt7475
>     amc6821
> +   amd_energy
>     asb100
>     asc7621
>     aspeed-pwm-tacho
Naveen Krishna Ch May 6, 2020, 5:11 p.m. UTC | #2
Hi Guenter

On Wed, 6 May 2020 at 22:03, Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote:
> > Document amd_energy driver with all chips supported by it.
> >
> > Cc: Guenter Roeck <linux@roeck-us.net>
> > Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
> > ---
> > Changes in v5: None
> >
> >  Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++
> >  Documentation/hwmon/index.rst      |   1 +
> >  2 files changed, 101 insertions(+)
> >  create mode 100644 Documentation/hwmon/amd_energy.rst
> >
> > diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
> > new file mode 100644
> > index 000000000000..2216c8b13e58
> > --- /dev/null
> > +++ b/Documentation/hwmon/amd_energy.rst
> > @@ -0,0 +1,100 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Kernel driver amd_energy
> > +==========================
> > +
> > +Supported chips:
> > +
> > +* AMD Family 17h Processors
> > +
> > +  Prefix: 'amd_energy'
> > +
> > +  Addresses used:  RAPL MSRs
> > +
> > +  Datasheets:
> > +
> > +  - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
> > +
> > +     https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
> > +
> > +  - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
> > +
> > +     https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
> > +
> > +Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
> > +
> > +Description
> > +-----------
> > +
> > +The Energy driver exposes the energy counters that are
> > +reported via the Running Average Power Limit (RAPL)
> > +Model-specific Registers (MSRs) via the hardware monitor
> > +(HWMON) sysfs interface.
> > +
> > +1. Power, Energy and Time Units
> > +   MSR_RAPL_POWER_UNIT/ C001_0299:
> > +   shared with all cores in the socket
> > +
> > +2. Energy consumed by each Core
> > +   MSR_CORE_ENERGY_STATUS/ C001_029A:
> > +   32-bitRO, Accumulator, core-level power reporting
> > +
> > +3. Energy consumed by Socket
> > +   MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
> > +   32-bitRO, Accumulator, socket-level power reporting,
> > +   shared with all cores in socket
> > +
> > +These registers are updated every 1ms and cleared on
> > +reset of the system.
> > +
> > +Energy Caluclation
> > +------------------
> > +
> > +Energy information (in Joules) is based on the multiplier,
> > +1/2^ESU; where ESU is an unsigned integer read from
> > +MSR_RAPL_POWER_UNIT register. Default value is 10000b,
> > +indicating energy status unit is 15.3 micro-Joules increment.
> > +
> > +Reported values are scaled as per the formula
> > +
> > +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules
> > +
> > +Users calculate power for a given domain by calculating
> > +     dEnergy/dTime for that domain.
> > +
> > +Socket energy accumulation
> > +--------------------------
> > +
> > +Current Socket energy status register is 32bit, assuming a 240W
> > +system, the register would wrap around in
> > +
> > +     2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins)
> > +
> > +To improve the wrap around time, a kernel thread is implemented
> > +to accumulate the socket energy counter to a 64-bit counter. The
> > +kernel thread starts running during probe, wakes up at 100secs
>
> wakes up every 100 seconds
>
> > +and stops running in remove.
>
> stops running when the driver is removed.
Will correct them
>
> All counters need to be be updated by the kernel thread, not just the socket
> counter. If the socket counter can wrap in 4.5 minutes, the matching per-core
> counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which
> isn't much better. This might be even worse on a system with fewer cores and
> higher per-core power.

Agreed, just need few clarifications though
1. Is it OK to implement another thread for cores alone, as it need not run as
frequently as the socket thread.
2. We have a scenario on servers, a thread accumulating energy for all 128 cores
might compromise the compute. So, i would like to provide a configuration
symbol or sysfs mechanism to enable/disable the core accumulation.

>
> > +
> > +A socket energy read would return the current register value
> > +added to the respective energy accumulator.
> > +
> > +Sysfs attributes
> > +----------------
> > +
> > +=============== ========  =====================================
> > +Attribute    Label     Description
> > +===============      ========  =====================================
> > +
> > +* For index N between [1] and [nr_cpus]
> > +
> > +===============      ========  ======================================
> > +energy[N]_input EcoreX         Core Energy   X = [0] to [nr_cpus - 1]
> > +                       Measured input core energy
> > +===============      ========  ======================================
> > +
> > +* For N between [nr_cpus] and [nr_cpus + nr_socks]
> > +
> > +===============      ========  ======================================
> > +energy[N]_input EsocketX  Socket Energy X = [0] to [nr_socks -1]
> > +                       Measured input socket energy
> > +=============== ========  ======================================
> > diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> > index 8ef62fd39787..fc4b89810e67 100644
> > --- a/Documentation/hwmon/index.rst
> > +++ b/Documentation/hwmon/index.rst
> > @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers
> >     adt7470
> >     adt7475
> >     amc6821
> > +   amd_energy
> >     asb100
> >     asc7621
> >     aspeed-pwm-tacho
Guenter Roeck May 6, 2020, 7:27 p.m. UTC | #3
Hi,

On 5/6/20 10:11 AM, Naveen Krishna Ch wrote:
> Hi Guenter
> 
> On Wed, 6 May 2020 at 22:03, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote:
>>> Document amd_energy driver with all chips supported by it.
>>>
>>> Cc: Guenter Roeck <linux@roeck-us.net>
>>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
>>> ---
>>> Changes in v5: None
>>>
>>>  Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++
>>>  Documentation/hwmon/index.rst      |   1 +
>>>  2 files changed, 101 insertions(+)
>>>  create mode 100644 Documentation/hwmon/amd_energy.rst
>>>
>>> diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
>>> new file mode 100644
>>> index 000000000000..2216c8b13e58
>>> --- /dev/null
>>> +++ b/Documentation/hwmon/amd_energy.rst
>>> @@ -0,0 +1,100 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +
>>> +Kernel driver amd_energy
>>> +==========================
>>> +
>>> +Supported chips:
>>> +
>>> +* AMD Family 17h Processors
>>> +
>>> +  Prefix: 'amd_energy'
>>> +
>>> +  Addresses used:  RAPL MSRs
>>> +
>>> +  Datasheets:
>>> +
>>> +  - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
>>> +
>>> +     https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
>>> +
>>> +  - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
>>> +
>>> +     https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
>>> +
>>> +Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
>>> +
>>> +Description
>>> +-----------
>>> +
>>> +The Energy driver exposes the energy counters that are
>>> +reported via the Running Average Power Limit (RAPL)
>>> +Model-specific Registers (MSRs) via the hardware monitor
>>> +(HWMON) sysfs interface.
>>> +
>>> +1. Power, Energy and Time Units
>>> +   MSR_RAPL_POWER_UNIT/ C001_0299:
>>> +   shared with all cores in the socket
>>> +
>>> +2. Energy consumed by each Core
>>> +   MSR_CORE_ENERGY_STATUS/ C001_029A:
>>> +   32-bitRO, Accumulator, core-level power reporting
>>> +
>>> +3. Energy consumed by Socket
>>> +   MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
>>> +   32-bitRO, Accumulator, socket-level power reporting,
>>> +   shared with all cores in socket
>>> +
>>> +These registers are updated every 1ms and cleared on
>>> +reset of the system.
>>> +
>>> +Energy Caluclation
>>> +------------------
>>> +
>>> +Energy information (in Joules) is based on the multiplier,
>>> +1/2^ESU; where ESU is an unsigned integer read from
>>> +MSR_RAPL_POWER_UNIT register. Default value is 10000b,
>>> +indicating energy status unit is 15.3 micro-Joules increment.
>>> +
>>> +Reported values are scaled as per the formula
>>> +
>>> +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules
>>> +
>>> +Users calculate power for a given domain by calculating
>>> +     dEnergy/dTime for that domain.
>>> +
>>> +Socket energy accumulation
>>> +--------------------------
>>> +
>>> +Current Socket energy status register is 32bit, assuming a 240W
>>> +system, the register would wrap around in
>>> +
>>> +     2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins)
>>> +
>>> +To improve the wrap around time, a kernel thread is implemented
>>> +to accumulate the socket energy counter to a 64-bit counter. The
>>> +kernel thread starts running during probe, wakes up at 100secs
>>
>> wakes up every 100 seconds
>>
>>> +and stops running in remove.
>>
>> stops running when the driver is removed.
> Will correct them
>>
>> All counters need to be be updated by the kernel thread, not just the socket
>> counter. If the socket counter can wrap in 4.5 minutes, the matching per-core
>> counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which
>> isn't much better. This might be even worse on a system with fewer cores and
>> higher per-core power.
> 
> Agreed, just need few clarifications though
> 1. Is it OK to implement another thread for cores alone, as it need not run as
> frequently as the socket thread.

Your call, but personally I think it is not worth the overhead; see below.

> 2. We have a scenario on servers, a thread accumulating energy for all 128 cores
> might compromise the compute. So, i would like to provide a configuration
> symbol or sysfs mechanism to enable/disable the core accumulation.
> 

Another option would be to use a single thread but only update a single core
per socket at a time. If the socket thread needs to run every N seconds,
one would assume that the core thread only needs to run every N * (number
of cores) seconds (assuming that it uses the same scale). If so, reading
the data for one core (or maybe a couple of cores if the scale is different)
plus the data for the socket should not be that expensive.

If that is not acceptable, it might make more sense to blacklist the driver
entirely in such situations; without accumulation the reported values are
pretty much worthless.

Thanks,
Guenter

>>
>>> +
>>> +A socket energy read would return the current register value
>>> +added to the respective energy accumulator.
>>> +
>>> +Sysfs attributes
>>> +----------------
>>> +
>>> +=============== ========  =====================================
>>> +Attribute    Label     Description
>>> +===============      ========  =====================================
>>> +
>>> +* For index N between [1] and [nr_cpus]
>>> +
>>> +===============      ========  ======================================
>>> +energy[N]_input EcoreX         Core Energy   X = [0] to [nr_cpus - 1]
>>> +                       Measured input core energy
>>> +===============      ========  ======================================
>>> +
>>> +* For N between [nr_cpus] and [nr_cpus + nr_socks]
>>> +
>>> +===============      ========  ======================================
>>> +energy[N]_input EsocketX  Socket Energy X = [0] to [nr_socks -1]
>>> +                       Measured input socket energy
>>> +=============== ========  ======================================
>>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
>>> index 8ef62fd39787..fc4b89810e67 100644
>>> --- a/Documentation/hwmon/index.rst
>>> +++ b/Documentation/hwmon/index.rst
>>> @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers
>>>     adt7470
>>>     adt7475
>>>     amc6821
>>> +   amd_energy
>>>     asb100
>>>     asc7621
>>>     aspeed-pwm-tacho
> 
> 
>
Naveen Krishna Ch May 11, 2020, 4:49 p.m. UTC | #4
Hi Guenter

On Thu, 7 May 2020 at 00:57, Guenter Roeck <linux@roeck-us.net> wrote:
>
> Hi,
>
> On 5/6/20 10:11 AM, Naveen Krishna Ch wrote:
> > Hi Guenter
> >
> > On Wed, 6 May 2020 at 22:03, Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On Fri, May 01, 2020 at 11:20:02PM +0530, Naveen Krishna Chatradhi wrote:
> >>> Document amd_energy driver with all chips supported by it.
> >>>
> >>> Cc: Guenter Roeck <linux@roeck-us.net>
> >>> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
> >>> ---
> >>> Changes in v5: None
> >>>
> >>>  Documentation/hwmon/amd_energy.rst | 100 +++++++++++++++++++++++++++++
> >>>  Documentation/hwmon/index.rst      |   1 +
> >>>  2 files changed, 101 insertions(+)
> >>>  create mode 100644 Documentation/hwmon/amd_energy.rst
> >>>
> >>> diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
> >>> new file mode 100644
> >>> index 000000000000..2216c8b13e58
> >>> --- /dev/null
> >>> +++ b/Documentation/hwmon/amd_energy.rst
> >>> @@ -0,0 +1,100 @@
> >>> +.. SPDX-License-Identifier: GPL-2.0
> >>> +
> >>> +Kernel driver amd_energy
> >>> +==========================
> >>> +
> >>> +Supported chips:
> >>> +
> >>> +* AMD Family 17h Processors
> >>> +
> >>> +  Prefix: 'amd_energy'
> >>> +
> >>> +  Addresses used:  RAPL MSRs
> >>> +
> >>> +  Datasheets:
> >>> +
> >>> +  - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
> >>> +
> >>> +     https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
> >>> +
> >>> +  - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
> >>> +
> >>> +     https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
> >>> +
> >>> +Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
> >>> +
> >>> +Description
> >>> +-----------
> >>> +
> >>> +The Energy driver exposes the energy counters that are
> >>> +reported via the Running Average Power Limit (RAPL)
> >>> +Model-specific Registers (MSRs) via the hardware monitor
> >>> +(HWMON) sysfs interface.
> >>> +
> >>> +1. Power, Energy and Time Units
> >>> +   MSR_RAPL_POWER_UNIT/ C001_0299:
> >>> +   shared with all cores in the socket
> >>> +
> >>> +2. Energy consumed by each Core
> >>> +   MSR_CORE_ENERGY_STATUS/ C001_029A:
> >>> +   32-bitRO, Accumulator, core-level power reporting
> >>> +
> >>> +3. Energy consumed by Socket
> >>> +   MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
> >>> +   32-bitRO, Accumulator, socket-level power reporting,
> >>> +   shared with all cores in socket
> >>> +
> >>> +These registers are updated every 1ms and cleared on
> >>> +reset of the system.
> >>> +
> >>> +Energy Caluclation
> >>> +------------------
> >>> +
> >>> +Energy information (in Joules) is based on the multiplier,
> >>> +1/2^ESU; where ESU is an unsigned integer read from
> >>> +MSR_RAPL_POWER_UNIT register. Default value is 10000b,
> >>> +indicating energy status unit is 15.3 micro-Joules increment.
> >>> +
> >>> +Reported values are scaled as per the formula
> >>> +
> >>> +scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules
> >>> +
> >>> +Users calculate power for a given domain by calculating
> >>> +     dEnergy/dTime for that domain.
> >>> +
> >>> +Socket energy accumulation
> >>> +--------------------------
> >>> +
> >>> +Current Socket energy status register is 32bit, assuming a 240W
> >>> +system, the register would wrap around in
> >>> +
> >>> +     2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins)
> >>> +
> >>> +To improve the wrap around time, a kernel thread is implemented
> >>> +to accumulate the socket energy counter to a 64-bit counter. The
> >>> +kernel thread starts running during probe, wakes up at 100secs
> >>
> >> wakes up every 100 seconds
> >>
> >>> +and stops running in remove.
> >>
> >> stops running when the driver is removed.
> > Will correct them
> >>
> >> All counters need to be be updated by the kernel thread, not just the socket
> >> counter. If the socket counter can wrap in 4.5 minutes, the matching per-core
> >> counters on a 64-core system can wrap every 4.5 * 64 = 288 minutes, which
> >> isn't much better. This might be even worse on a system with fewer cores and
> >> higher per-core power.
> >
> > Agreed, just need few clarifications though
> > 1. Is it OK to implement another thread for cores alone, as it need not run as
> > frequently as the socket thread.
>
> Your call, but personally I think it is not worth the overhead; see below.
>
> > 2. We have a scenario on servers, a thread accumulating energy for all 128 cores
> > might compromise the compute. So, i would like to provide a configuration
> > symbol or sysfs mechanism to enable/disable the core accumulation.
> >
>
> Another option would be to use a single thread but only update a single core
> per socket at a time. If the socket thread needs to run every N seconds,
> one would assume that the core thread only needs to run every N * (number
> of cores) seconds (assuming that it uses the same scale). If so, reading
> the data for one core (or maybe a couple of cores if the scale is different)
> plus the data for the socket should not be that expensive.
This is good and possible. Thanks
>
> If that is not acceptable, it might make more sense to blacklist the driver
> entirely in such situations; without accumulation the reported values are
> pretty much worthless.
Sure, will implement core accumulation as well.
>
> Thanks,
> Guenter
>
> >>
> >>> +
> >>> +A socket energy read would return the current register value
> >>> +added to the respective energy accumulator.
> >>> +
> >>> +Sysfs attributes
> >>> +----------------
> >>> +
> >>> +=============== ========  =====================================
> >>> +Attribute    Label     Description
> >>> +===============      ========  =====================================
> >>> +
> >>> +* For index N between [1] and [nr_cpus]
> >>> +
> >>> +===============      ========  ======================================
> >>> +energy[N]_input EcoreX         Core Energy   X = [0] to [nr_cpus - 1]
> >>> +                       Measured input core energy
> >>> +===============      ========  ======================================
> >>> +
> >>> +* For N between [nr_cpus] and [nr_cpus + nr_socks]
> >>> +
> >>> +===============      ========  ======================================
> >>> +energy[N]_input EsocketX  Socket Energy X = [0] to [nr_socks -1]
> >>> +                       Measured input socket energy
> >>> +=============== ========  ======================================
> >>> diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
> >>> index 8ef62fd39787..fc4b89810e67 100644
> >>> --- a/Documentation/hwmon/index.rst
> >>> +++ b/Documentation/hwmon/index.rst
> >>> @@ -39,6 +39,7 @@ Hardware Monitoring Kernel Drivers
> >>>     adt7470
> >>>     adt7475
> >>>     amc6821
> >>> +   amd_energy
> >>>     asb100
> >>>     asc7621
> >>>     aspeed-pwm-tacho
> >
> >
> >
>
diff mbox series

Patch

diff --git a/Documentation/hwmon/amd_energy.rst b/Documentation/hwmon/amd_energy.rst
new file mode 100644
index 000000000000..2216c8b13e58
--- /dev/null
+++ b/Documentation/hwmon/amd_energy.rst
@@ -0,0 +1,100 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel driver amd_energy
+==========================
+
+Supported chips:
+
+* AMD Family 17h Processors
+
+  Prefix: 'amd_energy'
+
+  Addresses used:  RAPL MSRs
+
+  Datasheets:
+
+  - Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors
+
+	https://developer.amd.com/wp-content/resources/55570-B1_PUB.zip
+
+  - Preliminary Processor Programming Reference (PPR) for AMD Family 17h Model 31h, Revision B0 Processors
+
+	https://developer.amd.com/wp-content/resources/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
+
+Author: Naveen Krishna Chatradhi <nchatrad@amd.com>
+
+Description
+-----------
+
+The Energy driver exposes the energy counters that are
+reported via the Running Average Power Limit (RAPL)
+Model-specific Registers (MSRs) via the hardware monitor
+(HWMON) sysfs interface.
+
+1. Power, Energy and Time Units
+   MSR_RAPL_POWER_UNIT/ C001_0299:
+   shared with all cores in the socket
+
+2. Energy consumed by each Core
+   MSR_CORE_ENERGY_STATUS/ C001_029A:
+   32-bitRO, Accumulator, core-level power reporting
+
+3. Energy consumed by Socket
+   MSR_PACKAGE_ENERGY_STATUS/ C001_029B:
+   32-bitRO, Accumulator, socket-level power reporting,
+   shared with all cores in socket
+
+These registers are updated every 1ms and cleared on
+reset of the system.
+
+Energy Caluclation
+------------------
+
+Energy information (in Joules) is based on the multiplier,
+1/2^ESU; where ESU is an unsigned integer read from
+MSR_RAPL_POWER_UNIT register. Default value is 10000b,
+indicating energy status unit is 15.3 micro-Joules increment.
+
+Reported values are scaled as per the formula
+
+scaled value = ((1/2^ESU) * (Raw value) * 1000000UL) in Joules
+
+Users calculate power for a given domain by calculating
+	dEnergy/dTime for that domain.
+
+Socket energy accumulation
+--------------------------
+
+Current Socket energy status register is 32bit, assuming a 240W
+system, the register would wrap around in
+
+	2^32*15.3 e-6/240 = 273.80416512 secs to wrap(~4.5 mins)
+
+To improve the wrap around time, a kernel thread is implemented
+to accumulate the socket energy counter to a 64-bit counter. The
+kernel thread starts running during probe, wakes up at 100secs
+and stops running in remove.
+
+A socket energy read would return the current register value
+added to the respective energy accumulator.
+
+Sysfs attributes
+----------------
+
+=============== ========  =====================================
+Attribute	Label	  Description
+===============	========  =====================================
+
+* For index N between [1] and [nr_cpus]
+
+===============	========  ======================================
+energy[N]_input EcoreX	  Core Energy   X = [0] to [nr_cpus - 1]
+			  Measured input core energy
+===============	========  ======================================
+
+* For N between [nr_cpus] and [nr_cpus + nr_socks]
+
+===============	========  ======================================
+energy[N]_input EsocketX  Socket Energy X = [0] to [nr_socks -1]
+			  Measured input socket energy
+=============== ========  ======================================
diff --git a/Documentation/hwmon/index.rst b/Documentation/hwmon/index.rst
index 8ef62fd39787..fc4b89810e67 100644
--- a/Documentation/hwmon/index.rst
+++ b/Documentation/hwmon/index.rst
@@ -39,6 +39,7 @@  Hardware Monitoring Kernel Drivers
    adt7470
    adt7475
    amc6821
+   amd_energy
    asb100
    asc7621
    aspeed-pwm-tacho