diff mbox series

[1/3] Documentation: admin-guide: pm: Add efficiency vs. latency tradeoff to uncore documentation

Message ID 20240821131321.824326-2-tero.kristo@linux.intel.com (mailing list archive)
State Changes Requested, archived
Headers show
Series platform/x86: Add support for Intel uncore ELC feature | expand

Commit Message

Tero Kristo Aug. 21, 2024, 1:10 p.m. UTC
Added documentation about the functionality of efficiency vs. latency tradeoff
control in intel Xeon processors, and how this is configured via sysfs.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
---
 .../pm/intel_uncore_frequency_scaling.rst     | 51 +++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Ilpo Järvinen Aug. 23, 2024, 12:28 p.m. UTC | #1
On Wed, 21 Aug 2024, Tero Kristo wrote:

> Added documentation about the functionality of efficiency vs. latency tradeoff
> control in intel Xeon processors, and how this is configured via sysfs.
> 
> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
> ---
>  .../pm/intel_uncore_frequency_scaling.rst     | 51 +++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
> index 5ab3440e6cee..fb83aa2b744e 100644
> --- a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
> +++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
> @@ -113,3 +113,54 @@ to apply at each uncore* level.
>  
>  Support for "current_freq_khz" is available only at each fabric cluster
>  level (i.e., in uncore* directory).
> +
> +Efficiency vs. Latency Tradeoff

Does this section even cover the "tradeoff" part in its body? Why not call 
it directly "Control" after ELC?

> +-------------------------------
> +
> +In the realm of high-performance computing, particularly with Xeon
> +processors, managing uncore frequency is an important aspect of system
> +optimization. Traditionally, the uncore frequency is ramped up rapidly
> +in high load scenarios. While this strategy achieves low latency, which
> +is crucial for time-sensitive computations, it does not necessarily yield
> +the best performance per watt, —a key metric for energy efficiency and
> +operational cost savings.

This entire paragraph feels more prose or history book than documentation 
text. I'd suggest using something that goes more directly into the point
about what ELC brings to the table (I suppose the goal is "performance 
per watt" optimization, even that goal is only implied by the current 
text, not explicitly stated as the goal here).
srinivas pandruvada Aug. 26, 2024, 2:45 p.m. UTC | #2
On Fri, 2024-08-23 at 15:28 +0300, Ilpo Järvinen wrote:
> On Wed, 21 Aug 2024, Tero Kristo wrote:
> 
> > Added documentation about the functionality of efficiency vs.
> > latency tradeoff
> > control in intel Xeon processors, and how this is configured via
> > sysfs.
> > 
> > Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
> > ---
> >  .../pm/intel_uncore_frequency_scaling.rst     | 51
> > +++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> > 
> > diff --git a/Documentation/admin-
> > guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-
> > guide/pm/intel_uncore_frequency_scaling.rst
> > index 5ab3440e6cee..fb83aa2b744e 100644
> > --- a/Documentation/admin-
> > guide/pm/intel_uncore_frequency_scaling.rst
> > +++ b/Documentation/admin-
> > guide/pm/intel_uncore_frequency_scaling.rst
> > @@ -113,3 +113,54 @@ to apply at each uncore* level.
> >  
> >  Support for "current_freq_khz" is available only at each fabric
> > cluster
> >  level (i.e., in uncore* directory).
> > +
> > +Efficiency vs. Latency Tradeoff
> 
> Does this section even cover the "tradeoff" part in its body? Why not
> call 
> it directly "Control" after ELC?
> 
> > +-------------------------------
> > +
> > +In the realm of high-performance computing, particularly with Xeon
> > +processors, managing uncore frequency is an important aspect of
> > system
> > +optimization. Traditionally, the uncore frequency is ramped up
> > rapidly
> > +in high load scenarios. While this strategy achieves low latency,
> > which
> > +is crucial for time-sensitive computations, it does not
> > necessarily yield
> > +the best performance per watt, —a key metric for energy efficiency
> > and
> > +operational cost savings.
> 
> This entire paragraph feels more prose or history book than
> documentation 
> text. I'd suggest using something that goes more directly into the
> point
> about what ELC brings to the table (I suppose the goal is
> "performance 
> per watt" optimization, even that goal is only implied by the current
> text, not explicitly stated as the goal here).
> 

What about this?

Traditionally, the uncore frequency is ramped up to reach the maximum 
possible level based on parameters like EPB (Energy perf Bias) and
other system power management settings programmed by BIOS.  While this
strategy achieves low latency for latency sensitive applications, it
does not necessarily yield the best performance per watt. 

The Efficiency Latency Control (ELC) feature is added to improve
performance per watt. With this feature hardware power management
algorithms optimize trade-off between latency and power consumption.
But for some latency sensitive workloads further tuning can be done
from OS to get desired performance.

The hardware monitors the average CPU utilization across all cores
in a power domain at regular intervals and decides a uncore frequency. 
While this may result in the best performance per watt, workload may be
expecting higher performance at the expense of power. Consider an
application that intermittently wakes up to perform memory reads on an
otherwise idle system. In such cases, if hardware lowers uncore
frequency, then there may be delay in ramp up of frequency to meet
target performance. 

The ELC control defines some parameters which can be changed from OS.
If the average CPU utilization is below a user defined threshold
(elc_low_threshold_percent attribute below), the user defined uncore
frequency floor frequency will be used (elc_floor_freq_khz attribute 
below) instead of hardware calculated minimum. 

Similarly in high load scenario where the CPU utilization goes above 
the high threshold value (elc_high_threshold_percent attribute below) 
instead of jumping to maximum uncore frequency, uncore frequency is 
increased in 100MHz steps until the power limit is reached.

Attributes for efficiency latency control: 
.. 
.. 

Thanks,
Srinivas
Ilpo Järvinen Aug. 27, 2024, 8:08 a.m. UTC | #3
On Mon, 26 Aug 2024, srinivas pandruvada wrote:

> On Fri, 2024-08-23 at 15:28 +0300, Ilpo Järvinen wrote:
> > On Wed, 21 Aug 2024, Tero Kristo wrote:
> > 
> > > Added documentation about the functionality of efficiency vs.
> > > latency tradeoff
> > > control in intel Xeon processors, and how this is configured via
> > > sysfs.
> > > 
> > > Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
> > > ---
> > >  .../pm/intel_uncore_frequency_scaling.rst     | 51
> > > +++++++++++++++++++
> > >  1 file changed, 51 insertions(+)
> > > 
> > > diff --git a/Documentation/admin-
> > > guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-
> > > guide/pm/intel_uncore_frequency_scaling.rst
> > > index 5ab3440e6cee..fb83aa2b744e 100644
> > > --- a/Documentation/admin-
> > > guide/pm/intel_uncore_frequency_scaling.rst
> > > +++ b/Documentation/admin-
> > > guide/pm/intel_uncore_frequency_scaling.rst
> > > @@ -113,3 +113,54 @@ to apply at each uncore* level.
> > >  
> > >  Support for "current_freq_khz" is available only at each fabric
> > > cluster
> > >  level (i.e., in uncore* directory).
> > > +
> > > +Efficiency vs. Latency Tradeoff
> > 
> > Does this section even cover the "tradeoff" part in its body? Why not
> > call 
> > it directly "Control" after ELC?
> > 
> > > +-------------------------------
> > > +
> > > +In the realm of high-performance computing, particularly with Xeon
> > > +processors, managing uncore frequency is an important aspect of
> > > system
> > > +optimization. Traditionally, the uncore frequency is ramped up
> > > rapidly
> > > +in high load scenarios. While this strategy achieves low latency,
> > > which
> > > +is crucial for time-sensitive computations, it does not
> > > necessarily yield
> > > +the best performance per watt, —a key metric for energy efficiency
> > > and
> > > +operational cost savings.
> > 
> > This entire paragraph feels more prose or history book than
> > documentation 
> > text. I'd suggest using something that goes more directly into the
> > point
> > about what ELC brings to the table (I suppose the goal is
> > "performance 
> > per watt" optimization, even that goal is only implied by the current
> > text, not explicitly stated as the goal here).
> > 
> 
> What about this?
> 
> Traditionally, the uncore frequency is ramped up to reach the maximum 
> possible level based on parameters like EPB (Energy perf Bias) and
> other system power management settings programmed by BIOS.  While this
> strategy achieves low latency for latency sensitive applications, it
> does not necessarily yield the best performance per watt. 

This again starts with a wrong foot. Don't use words like "traditionally",
"in the past", "historically", "is added", etc. that refer to past time
in documentation text at all. The premise with documentation for feature x 
is that the feature x exists. After these patches have been accepted, the 
reality is that ELC exists and time before does not matter so we don't 
encumber documentation text with that era that has become irrelevant.

You might occasionally have to mention what is not possible without ELC 
in case it's still possible to run stuff without ELC but don't put time 
references to it. However, it's not something you should start with in
the documentation text.

> The Efficiency Latency Control (ELC) feature is added to improve

"is added to improve" -> "improves"

> performance per watt. With this feature hardware power management
> algorithms optimize trade-off between latency and power consumption.
> But for some latency sensitive workloads further tuning can be done
> from OS to get desired performance.

I'd just start with this paragraph. It goes straight into the point and 
is good in that it tries to summarize what ELC tries to achieve.

> The hardware monitors the average CPU utilization across all cores

hardware or ELC-capable HW?

> in a power domain at regular intervals and decides a uncore frequency. 

This kind of feels something that belongs to the first paragraph if it's 
about ELC. (I'm left slightly unsure if ELC refers only to those controls 
mentioned below, or if it is the automatic uncore freq control plus the 
manual controls. I assume it's the latter because of "with this feature 
hardware power management algorithms optimize" sentence.)

> While this may result in the best performance per watt, workload may be
> expecting higher performance at the expense of power. Consider an
> application that intermittently wakes up to perform memory reads on an
> otherwise idle system. In such cases, if hardware lowers uncore
> frequency, then there may be delay in ramp up of frequency to meet
> target performance. 
>
> The ELC control defines some parameters which can be changed from OS.
> If the average CPU utilization is below a user defined threshold
> (elc_low_threshold_percent attribute below), the user defined uncore
> frequency floor frequency will be used (elc_floor_freq_khz attribute 
> below) instead of hardware calculated minimum. 
>
> Similarly in high load scenario where the CPU utilization goes above 
> the high threshold value (elc_high_threshold_percent attribute below) 
> instead of jumping to maximum uncore frequency, uncore frequency is 
> increased in 100MHz steps until the power limit is reached.
>
> Attributes for efficiency latency control: 
> .. 
> .. 

There were a few spaces at the end if lines, those should be removed.
srinivas pandruvada Aug. 27, 2024, 11:30 a.m. UTC | #4
On Tue, 2024-08-27 at 11:08 +0300, Ilpo Järvinen wrote:
> On Mon, 26 Aug 2024, srinivas pandruvada wrote:
> 
> > On Fri, 2024-08-23 at 15:28 +0300, Ilpo Järvinen wrote:
> > > On Wed, 21 Aug 2024, Tero Kristo wrote:
> > > 
> > > > Added documentation about the functionality of efficiency vs.
> > > > latency tradeoff
> > > > control in intel Xeon processors, and how this is configured
> > > > via
> > > > sysfs.
> > > > 
> > > > Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
> > > > ---
> > > >  .../pm/intel_uncore_frequency_scaling.rst     | 51
> > > > +++++++++++++++++++⁠Ayoub, Hatim This seems that when on AC
> > > > mode, Windows don't care about PC10. Is this correct? It seems
> > > > that with EPB=6 we can
> > > >  1 file changed, 51 insertions(+)
> > > > 
> > > > diff --git a/Documentation/admin-
> > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > b/Documentation/admin-
> > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > index 5ab3440e6cee..fb83aa2b744e 100644
> > > > --- a/Documentation/admin-
> > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > +++ b/Documentation/admin-
> > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > @@ -113,3 +113,54 @@ to apply at each uncore* level.
> > > >  
> > > >  Support for "current_freq_khz" is available only at each
> > > > fabric
> > > > cluster
> > > >  level (i.e., in uncore* directory).
> > > > +
> > > > +Efficiency vs. Latency Tradeoff
> > > 
> > > Does this section even cover the "tradeoff" part in its body? Why
> > > not
> > > call 
> > > it directly "Control" after ELC?
> > > 
> > > > +-------------------------------
> > > > +
> > > > +In the realm of high-performance computing, particularly with
> > > > Xeon
> > > > +processors, managing uncore frequency is an important aspect
> > > > of
> > > > system
> > > > +optimization. Traditionally, the uncore frequency is ramped up
> > > > rapidly
> > > > +in high load scenarios. While this strategy achieves low
> > > > latency,
> > > > which
> > > > +is crucial for time-sensitive computations, it does not
> > > > necessarily yield
> > > > +the best performance per watt, —a key metric for energy
> > > > efficiency
> > > > and
> > > > +operational cost savings.
> > > 
> > > This entire paragraph feels more prose or history book than
> > > documentation 
> > > text. I'd suggest using something that goes more directly into
> > > the
> > > point
> > > about what ELC brings to the table (I suppose the goal is
> > > "performance 
> > > per watt" optimization, even that goal is only implied by the
> > > current
> > > text, not explicitly stated as the goal here).
> > > 
> > 
> > What about this?
> > 
> > Traditionally, the uncore frequency is ramped up to reach the
> > maximum 
> > possible level based on parameters like EPB (Energy perf Bias) and
> > other system power management settings programmed by BIOS.  While
> > this
> > strategy achieves low latency for latency sensitive applications,
> > it
> > does not necessarily yield the best performance per watt. 
> 
> This again starts with a wrong foot. Don't use words like
> "traditionally",
> "in the past", "historically", "is added", etc. that refer to past
> time
> in documentation text at all. The premise with documentation for
> feature x 
> is that the feature x exists. After these patches have been accepted,
> the 
> reality is that ELC exists and time before does not matter so we
> don't 
> encumber documentation text with that era that has become irrelevant.
> 
While the choice of words are not correct, for me background is
important on why a feature is implemented.
Here even after ELC is implemented, majority of generations will still
not have this feature. Uncore is not just supported on TPMI systems.


> You might occasionally have to mention what is not possible without
> ELC 
> in case it's still possible to run stuff without ELC but don't put
> time 
> references to it. However, it's not something you should start with
> in
> the documentation text.
> 
> > The Efficiency Latency Control (ELC) feature is added to improve
> 
> "is added to improve" -> "improves"
Fine.

> 
> > performance per watt. With this feature hardware power management
> > algorithms optimize trade-off between latency and power
> > consumption.
> > But for some latency sensitive workloads further tuning can be done
> > from OS to get desired performance.
> 
> I'd just start with this paragraph. It goes straight into the point
> and 
> is good in that it tries to summarize what ELC tries to achieve.
There are so many features we have which improves perf/watt. Why ELC is
special needs some background.

> 
> > The hardware monitors the average CPU utilization across all cores
> 
> hardware or ELC-capable HW?
Hardware. hardware always does this.

> 
> > in a power domain at regular intervals and decides a uncore
> > frequency. 
> 
> This kind of feels something that belongs to the first paragraph if
> it's 
> about ELC. (I'm left slightly unsure if ELC refers only to those
> controls 
> mentioned below, or if it is the automatic uncore freq control plus
> the 
> manual controls. I assume it's the latter because of "with this
> feature 
> hardware power management algorithms optimize" sentence.)
It is later. Hardware doesn't do a PM feature depending only on OS.

> 
> > While this may result in the best performance per watt, workload
> > may be
> > expecting higher performance at the expense of power. Consider an
> > application that intermittently wakes up to perform memory reads on
> > an
> > otherwise idle system. In such cases, if hardware lowers uncore
> > frequency, then there may be delay in ramp up of frequency to meet
> > target performance. 
> > 
> > The ELC control defines some parameters which can be changed from
> > OS.
> > If the average CPU utilization is below a user defined threshold
> > (elc_low_threshold_percent attribute below), the user defined
> > uncore
> > frequency floor frequency will be used (elc_floor_freq_khz
> > attribute 
> > below) instead of hardware calculated minimum. 
> > 
> > Similarly in high load scenario where the CPU utilization goes
> > above 
> > the high threshold value (elc_high_threshold_percent attribute
> > below) 
> > instead of jumping to maximum uncore frequency, uncore frequency is
> > increased in 100MHz steps until the power limit is reached.
> > 
> > Attributes for efficiency latency control: 
> > .. 
> > .. 
> 
> There were a few spaces at the end if lines, those should be removed.
Yes in the patch.

Thanks,
Srinivas

>
Ilpo Järvinen Aug. 27, 2024, 1:34 p.m. UTC | #5
On Tue, 27 Aug 2024, srinivas pandruvada wrote:
> On Tue, 2024-08-27 at 11:08 +0300, Ilpo Järvinen wrote:
> > On Mon, 26 Aug 2024, srinivas pandruvada wrote:
> > 
> > > On Fri, 2024-08-23 at 15:28 +0300, Ilpo Järvinen wrote:
> > > > On Wed, 21 Aug 2024, Tero Kristo wrote:
> > > > 
> > > > > Added documentation about the functionality of efficiency vs.
> > > > > latency tradeoff
> > > > > control in intel Xeon processors, and how this is configured
> > > > > via
> > > > > sysfs.
> > > > > 
> > > > > Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
> > > > > ---
> > > > >  .../pm/intel_uncore_frequency_scaling.rst     | 51
> > > > > +++++++++++++++++++⁠Ayoub, Hatim This seems that when on AC
> > > > > mode, Windows don't care about PC10. Is this correct? It seems
> > > > > that with EPB=6 we can
> > > > >  1 file changed, 51 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/admin-
> > > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > > b/Documentation/admin-
> > > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > > index 5ab3440e6cee..fb83aa2b744e 100644
> > > > > --- a/Documentation/admin-
> > > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > > +++ b/Documentation/admin-
> > > > > guide/pm/intel_uncore_frequency_scaling.rst
> > > > > @@ -113,3 +113,54 @@ to apply at each uncore* level.
> > > > >  
> > > > >  Support for "current_freq_khz" is available only at each
> > > > > fabric
> > > > > cluster
> > > > >  level (i.e., in uncore* directory).
> > > > > +
> > > > > +Efficiency vs. Latency Tradeoff
> > > > 
> > > > Does this section even cover the "tradeoff" part in its body? Why
> > > > not
> > > > call 
> > > > it directly "Control" after ELC?
> > > > 
> > > > > +-------------------------------
> > > > > +
> > > > > +In the realm of high-performance computing, particularly with
> > > > > Xeon
> > > > > +processors, managing uncore frequency is an important aspect
> > > > > of
> > > > > system
> > > > > +optimization. Traditionally, the uncore frequency is ramped up
> > > > > rapidly
> > > > > +in high load scenarios. While this strategy achieves low
> > > > > latency,
> > > > > which
> > > > > +is crucial for time-sensitive computations, it does not
> > > > > necessarily yield
> > > > > +the best performance per watt, —a key metric for energy
> > > > > efficiency
> > > > > and
> > > > > +operational cost savings.
> > > > 
> > > > This entire paragraph feels more prose or history book than
> > > > documentation 
> > > > text. I'd suggest using something that goes more directly into
> > > > the
> > > > point
> > > > about what ELC brings to the table (I suppose the goal is
> > > > "performance 
> > > > per watt" optimization, even that goal is only implied by the
> > > > current
> > > > text, not explicitly stated as the goal here).
> > > > 
> > > 
> > > What about this?
> > > 
> > > Traditionally, the uncore frequency is ramped up to reach the
> > > maximum 
> > > possible level based on parameters like EPB (Energy perf Bias) and
> > > other system power management settings programmed by BIOS.  While
> > > this
> > > strategy achieves low latency for latency sensitive applications,
> > > it
> > > does not necessarily yield the best performance per watt. 
> > 
> > This again starts with a wrong foot. Don't use words like
> > "traditionally",
> > "in the past", "historically", "is added", etc. that refer to past
> > time
> > in documentation text at all. The premise with documentation for
> > feature x 
> > is that the feature x exists. After these patches have been accepted,
> > the 
> > reality is that ELC exists and time before does not matter so we
> > don't 
> > encumber documentation text with that era that has become irrelevant.
> > 
> While the choice of words are not correct, for me background is
> important on why a feature is implemented.
> Here even after ELC is implemented, majority of generations will still
> not have this feature. Uncore is not just supported on TPMI systems.

I don't doubt there are plenty of systems without it, but you're 
supposed to document ELC, not non-ELC systems under this section.

If the system does not have ELC, this section has zero relevance for
the admin. (And it's a job for marketting people, not for Linux 
documentation, to convince people they need this new and shiny
thing. :-))

Thus, the base assumption in this section is that ELC is supported and 
usable.

> > You might occasionally have to mention what is not possible without
> > ELC 
> > in case it's still possible to run stuff without ELC but don't put
> > time 
> > references to it. However, it's not something you should start with
> > in
> > the documentation text.
> > 
> > > The Efficiency Latency Control (ELC) feature is added to improve
> > 
> > "is added to improve" -> "improves"
> Fine.
> 
> > 
> > > performance per watt. With this feature hardware power management
> > > algorithms optimize trade-off between latency and power
> > > consumption.
> > > But for some latency sensitive workloads further tuning can be done
> > > from OS to get desired performance.
> > 
> > I'd just start with this paragraph. It goes straight into the point
> > and 
> > is good in that it tries to summarize what ELC tries to achieve.
>
> There are so many features we have which improves perf/watt.

I think you make an issue out of a non-issue if you think it's problem to 
state feature A improves performance if there are also features B and C 
that improve it.

> Why ELC is special needs some background.

"special"? I didn't get that impression at all. I'm far from convinced ELC 
is "special" or needs to be presented as such.

> > > The hardware monitors the average CPU utilization across all cores
> > 
> > hardware or ELC-capable HW?
> Hardware. hardware always does this.

So do I read you right, this is nothing ELC-specific? So all ELC does is 
adds those tunables (limits/overrides)?

> > > in a power domain at regular intervals and decides a uncore
> > > frequency. 
> > 
> > This kind of feels something that belongs to the first paragraph if
> > it's 
> > about ELC. (I'm left slightly unsure if ELC refers only to those
> > controls 
> > mentioned below, or if it is the automatic uncore freq control plus
> > the 
> > manual controls. I assume it's the latter because of "with this
> > feature 
> > hardware power management algorithms optimize" sentence.)
>
> It is later. Hardware doesn't do a PM feature depending only on OS.

I was specifically talking about what "ELC" is but you downgraded wording 
back to "hardware" in your reply. So I have to repeat myself, what ELC 
consists of? Are these non-OS dependent features (that "hardware always 
does") part of ELC or not (for the avoidance of potential 
misunderstandings, this is assuming no ELC tunables added by this series 
are touched by the admin)?

Obviously, when one of the tunables is touched, it impacts the allowed
operating region of the autonomous algorithm (ELC tunables acting as 
what look like "overrides").

> > > While this may result in the best performance per watt, workload
> > > may be
> > > expecting higher performance at the expense of power. Consider an
> > > application that intermittently wakes up to perform memory reads on
> > > an
> > > otherwise idle system. In such cases, if hardware lowers uncore
> > > frequency, then there may be delay in ramp up of frequency to meet
> > > target performance. 
> > > 
> > > The ELC control defines some parameters which can be changed from
> > > OS.
> > > If the average CPU utilization is below a user defined threshold
> > > (elc_low_threshold_percent attribute below), the user defined
> > > uncore
> > > frequency floor frequency will be used (elc_floor_freq_khz
> > > attribute 
> > > below) instead of hardware calculated minimum. 
> > > 
> > > Similarly in high load scenario where the CPU utilization goes
> > > above 
> > > the high threshold value (elc_high_threshold_percent attribute
> > > below) 
> > > instead of jumping to maximum uncore frequency, uncore frequency is
> > > increased in 100MHz steps until the power limit is reached.
> > > 
> > > Attributes for efficiency latency control: 
> > > .. 
> > > .. 
> > 
> > There were a few spaces at the end if lines, those should be removed.
> Yes in the patch.
> 
> Thanks,
> Srinivas
> 
> > 
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
index 5ab3440e6cee..fb83aa2b744e 100644
--- a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
+++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
@@ -113,3 +113,54 @@  to apply at each uncore* level.
 
 Support for "current_freq_khz" is available only at each fabric cluster
 level (i.e., in uncore* directory).
+
+Efficiency vs. Latency Tradeoff
+-------------------------------
+
+In the realm of high-performance computing, particularly with Xeon
+processors, managing uncore frequency is an important aspect of system
+optimization. Traditionally, the uncore frequency is ramped up rapidly
+in high load scenarios. While this strategy achieves low latency, which
+is crucial for time-sensitive computations, it does not necessarily yield
+the best performance per watt, —a key metric for energy efficiency and
+operational cost savings.
+
+The Efficiency vs. Latency Control (ELC) feature allows user to influence
+the uncore frequency scaling algorithm. Hardware monitors the average CPU
+utilization across all cores at regular intervals. If the average CPU
+utilization is below a user defined threshold (elc_low_threshold_percent),
+the user defined uncore frequency floor frequency will be used
+(elc_floor_freq_khz), minimizing latency. Similarly in high load scenario
+where the CPU utilization goes above the high threshold value
+(elc_high_threshold_percent) instead of jumping to maximum uncore
+frequency, uncore frequency is increased in 100MHz steps until the power
+limit is reached.
+
+Attributes for efficiency latency control:
+
+``elc_floor_freq_khz``
+	This attribute is used to get/set the efficiency latency floor frequency.
+	If this variable is lower than the 'min_freq_khz', it is ignored by
+	the firmware.
+
+``elc_low_threshold_percent``
+	This attribute is used to get/set the efficiency latency control low
+	threshold. This attribute is in percentages of CPU utilization.
+
+``elc_high_threshold_percent``
+	This attribute is used to get/set the efficiency latency control high
+	threshold. This attribute is in percentages of CPU utilization.
+
+``elc_high_threshold_enable``
+	This attribute is used to enable/disable the efficiency latency control
+	high threshold. Write '1' to enable, '0' to disable.
+
+Example system configuration below, which does following:
+  * when CPU utilization is less than 10%: sets uncore frequency to 800MHz
+  * when CPU utilization is higher than 95%: increases uncore frequency in
+    100MHz steps, until power limit is reached
+
+  elc_floor_freq_khz:800000
+  elc_high_threshold_percent:95
+  elc_high_threshold_enable:1
+  elc_low_threshold_percent:10