diff mbox series

[v4,12/15] xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op

Message ID 20230614180253.89958-13-jandryuk@gmail.com (mailing list archive)
State Superseded
Headers show
Series Intel Hardware P-States (HWP) support | expand

Commit Message

Jason Andryuk June 14, 2023, 6:02 p.m. UTC
Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters.  The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits.  This allows atomically applying new HWP configuration
via a single wrmsr.

XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets.  Setting them depends on hardware limits which the
hypervisor is already caching.  So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values.  The code is organized to allow a preset to be refined with
additional stuff if desired.

"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now.  Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>

---
v4:
Remove IA32_ENERGY_BIAS support
Validate parameters don't exceed 255
Use CPPC/cppc name
set_cppc_para() add const
set_cppc_para() return hwp_cpufreq_target()
Expand sysctl comments

v3:
Remove cpufreq_governor_internal from set_cpufreq_hwp

v2:
Update for naming anonymous union
Drop hwp_err for invalid input in set_hwp_para()
Drop uint16_t cast in XEN_SYSCTL_HWP_SET_PARAM_MASK
Drop parens for HWP_SET_PRESET defines
Reference activity_window format comment
Place SET_CPUFREQ_HWP after SET_CPUFREQ_PARA
Add {HWP,IA32}_ENERGY_PERF_MAX_{PERFORMANCE,POWERSAVE} defines
Order defines before fields in sysctl.h
Use XEN_HWP_GOVERNOR
Use per_cpu for hwp_drv_data
---
 xen/arch/x86/acpi/cpufreq/hwp.c    | 98 ++++++++++++++++++++++++++++++
 xen/drivers/acpi/pmstat.c          | 17 ++++++
 xen/include/acpi/cpufreq/cpufreq.h |  2 +
 xen/include/public/sysctl.h        | 58 ++++++++++++++++++
 4 files changed, 175 insertions(+)

Comments

Jan Beulich June 21, 2023, 3:27 p.m. UTC | #1
(re-adding xen-devel@)

On 21.06.2023 16:16, Jason Andryuk wrote:
> On Mon, Jun 19, 2023 at 10:47 AM Jan Beulich <jbeulich@suse.com> wrote:
>> On 14.06.2023 20:02, Jason Andryuk wrote:
>>> +    if ( !(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
>>> +         set_cppc->activity_window )
>>> +        return -EINVAL;
> 
> There were a few aspects intended to be checked, but I have failed to
> implement them all properly and consistently.  The 32bit fields of the
> CPPC interface are larger than the 8 bit HWP fields (10 bits for
> activity window).  So the first aspect was supposed to ensure all
> those out-of-range bits are 0.
> 
> The second aspect, which wasn't implemented properly, was that fields
> would be 0 unless the corresponding bit was set in set_params.
> 
> The third aspect was to fail if a field was specified but hardware
> support isn't available.  That is now only activity window.
> 
> Do aspects #1 and #2 sound appropriate?  We can discuss #3 below.

Personally I'd prefer if inapplicable fields weren't checked, unless we
expect re-use of those fields with a different way of indicating that
the field holds an applicable value. But my primary desire is for
checking to be as consistent as possible.

>> Feels like I have wondered before what good this check does. I'm
>> inclined to suggest to ...
> 
> This check was supposed to enforce #2.
> 
>>> +    if ( set_cppc->activity_window & ~XEN_SYSCTL_CPPC_ACT_WINDOW_MASK )
>>> +        return -EINVAL;
>>
>> ... fold the two relevant checks, omitting the middle one:
>>
>>     if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
>>          (!feature_hwp_activity_window ||
>>           (set_cppc->activity_window & ~XEN_SYSCTL_CPPC_ACT_WINDOW_MASK))
>>         return -EINVAL;
>>
>> Yet I'm also a little worried about the feature check, requiring the
>> caller to first figure out whether that feature is available. Would
>> it be an alternative to make such "best effort", preferably with
>> some indication that this aspect of the request was not carried out?
> 
> Yes, it would be nice to try and apply on a "best effort" basis as
> it's only activity window which may not be supported.
> 
> The SDM says, "Processors may support a subset of IA32_HWP_REQUEST
> fields as indicated by CPUID. Reads of non-supported fields will
> return 0. Writes to non-supported fields are ignored."
> 
> I'll have to test this, but potentially we just let the writes go
> through?  If the user checks xenpm, they will see that the activity
> window isn't supported?  Hmmm, I don't have a machine without activity
> window support, so I can't test it.  Skylake introduced HWP, but my
> skylake test system supports activity window.
> 
> Or do you want to make xen_set_cppc_para have an in/out and return the
> applied features?

Yes, that was what I meant with "indication of some sort". You could
e.g. simply clear the respective control bit in the request (and then
arrange for it to be copied back).

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/acpi/cpufreq/hwp.c b/xen/arch/x86/acpi/cpufreq/hwp.c
index 86c5793266..3ee046940c 100644
--- a/xen/arch/x86/acpi/cpufreq/hwp.c
+++ b/xen/arch/x86/acpi/cpufreq/hwp.c
@@ -23,6 +23,10 @@  static bool __ro_after_init feature_hdc;
 bool __initdata opt_cpufreq_hwp;
 static bool __ro_after_init opt_cpufreq_hdc = true;
 
+#define HWP_ENERGY_PERF_MAX_PERFORMANCE 0
+#define HWP_ENERGY_PERF_BALANCE         0x80
+#define HWP_ENERGY_PERF_MAX_POWERSAVE   0xff
+
 union hwp_request
 {
     struct
@@ -560,6 +564,100 @@  int get_hwp_para(const unsigned int cpu,
     return 0;
 }
 
+int set_hwp_para(struct cpufreq_policy *policy,
+                 const struct xen_set_cppc_para *set_cppc)
+{
+    unsigned int cpu = policy->cpu;
+    struct hwp_drv_data *data = per_cpu(hwp_drv_data, cpu);
+
+    if ( data == NULL )
+        return -EINVAL;
+
+    /* Validate all parameters first */
+    if ( set_cppc->set_params & ~XEN_SYSCTL_CPPC_SET_PARAM_MASK )
+        return -EINVAL;
+
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW &&
+         !feature_hwp_activity_window )
+        return -EINVAL;
+
+    if ( !(set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW) &&
+         set_cppc->activity_window )
+        return -EINVAL;
+
+    if ( set_cppc->activity_window & ~XEN_SYSCTL_CPPC_ACT_WINDOW_MASK )
+        return -EINVAL;
+
+    if ( (set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED) &&
+         set_cppc->desired != 0 &&
+         (set_cppc->desired < data->hw.lowest ||
+          set_cppc->desired > data->hw.highest) )
+        return -EINVAL;
+
+    /*
+     * minimum & maximum are not validated against lowest or highest as
+     * hardware doesn't seem to care and the SDM says CPUs will clip
+     * internally.
+     */
+    if ( set_cppc->minimum > 255 ||
+         set_cppc->maximum > 255 ||
+         set_cppc->energy_perf > 255 )
+        return -EINVAL;
+
+    /* Apply presets */
+    switch ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_PRESET_MASK )
+    {
+    case XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE:
+        data->minimum = data->hw.lowest;
+        data->maximum = data->hw.lowest;
+        data->activity_window = 0;
+        data->energy_perf = HWP_ENERGY_PERF_MAX_POWERSAVE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE:
+        data->minimum = data->hw.highest;
+        data->maximum = data->hw.highest;
+        data->activity_window = 0;
+        data->energy_perf = HWP_ENERGY_PERF_MAX_PERFORMANCE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_CPPC_SET_PRESET_BALANCE:
+        data->minimum = data->hw.lowest;
+        data->maximum = data->hw.highest;
+        data->activity_window = 0;
+        data->energy_perf = HWP_ENERGY_PERF_BALANCE;
+        data->desired = 0;
+        break;
+
+    case XEN_SYSCTL_CPPC_SET_PRESET_NONE:
+        break;
+
+    default:
+        return -EINVAL;
+    }
+
+    /* Further customize presets if needed */
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MINIMUM )
+        data->minimum = set_cppc->minimum;
+
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_MAXIMUM )
+        data->maximum = set_cppc->maximum;
+
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ENERGY_PERF )
+        data->energy_perf = set_cppc->energy_perf;
+
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_DESIRED )
+        data->desired = set_cppc->desired;
+
+    if ( set_cppc->set_params & XEN_SYSCTL_CPPC_SET_ACT_WINDOW )
+        data->activity_window = set_cppc->activity_window &
+                                XEN_SYSCTL_CPPC_ACT_WINDOW_MASK;
+
+    return hwp_cpufreq_target(policy, 0, 0);
+}
+
 int __init hwp_register_driver(void)
 {
     return cpufreq_register_driver(&hwp_cpufreq_driver);
diff --git a/xen/drivers/acpi/pmstat.c b/xen/drivers/acpi/pmstat.c
index 10143c084c..e4482bcdbc 100644
--- a/xen/drivers/acpi/pmstat.c
+++ b/xen/drivers/acpi/pmstat.c
@@ -402,6 +402,19 @@  static int set_cpufreq_para(struct xen_sysctl_pm_op *op)
     return ret;
 }
 
+static int set_cpufreq_cppc(const struct xen_sysctl_pm_op *op)
+{
+    struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_policy, op->cpuid);
+
+    if ( !policy || !policy->governor )
+        return -EINVAL;
+
+    if ( !hwp_active() )
+        return -EINVAL;
+
+    return set_hwp_para(policy, &op->u.set_cppc);
+}
+
 int do_pm_op(struct xen_sysctl_pm_op *op)
 {
     int ret = 0;
@@ -474,6 +487,10 @@  int do_pm_op(struct xen_sysctl_pm_op *op)
         break;
     }
 
+    case SET_CPUFREQ_CPPC:
+        ret = set_cpufreq_cppc(op);
+        break;
+
     case GET_CPUFREQ_AVGFREQ:
     {
         op->u.get_avgfreq = cpufreq_driver_getavg(op->cpuid, USR_GETAVG);
diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h
index fb655fac14..d757da2ef4 100644
--- a/xen/include/acpi/cpufreq/cpufreq.h
+++ b/xen/include/acpi/cpufreq/cpufreq.h
@@ -250,5 +250,7 @@  extern bool __initdata opt_cpufreq_hwp;
 int hwp_cmdline_parse(const char *s);
 int get_hwp_para(const unsigned int cpu,
                  struct xen_cppc_para *cppc_para);
+int set_hwp_para(struct cpufreq_policy *policy,
+                 const struct xen_set_cppc_para *set_cppc);
 
 #endif /* __XEN_CPUFREQ_PM_H__ */
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index 5c9b60b5d0..96c58eb998 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -351,6 +351,62 @@  struct xen_cppc_para {
     uint32_t activity_window;
 };
 
+/*
+ * Set CPPC values.
+ *
+ * Configured the parameters for CPPC.  Set bits in set_params control which
+ * values are applied.
+ *
+ * There are a set of presets along with individual fields.  Presets are
+ * applied first, and then individual fields.  This allows customizing
+ * a preset without having to specify every value.
+ *
+ * The preset options values are as follows:
+ *
+ * preset      | minimum | maxium  | energy_perf
+ * ------------+---------+---------+----------------
+ * powersave   | lowest  | lowest  | powersave (255)
+ * ------------+---------+---------+----------------
+ * balance     | lowest  | highest | balance (128)
+ * ------------+---------+---------+----------------
+ * performance | highest | highest | performance (0)
+ *
+ * desired and activity_window are set to 0, hardware selected.
+ */
+struct xen_set_cppc_para {
+#define XEN_SYSCTL_CPPC_SET_DESIRED              (1U << 0)
+#define XEN_SYSCTL_CPPC_SET_ENERGY_PERF          (1U << 1)
+#define XEN_SYSCTL_CPPC_SET_ACT_WINDOW           (1U << 2)
+#define XEN_SYSCTL_CPPC_SET_MINIMUM              (1U << 3)
+#define XEN_SYSCTL_CPPC_SET_MAXIMUM              (1U << 4)
+#define XEN_SYSCTL_CPPC_SET_PRESET_MASK          0xf000
+#define XEN_SYSCTL_CPPC_SET_PRESET_NONE          0x0000
+#define XEN_SYSCTL_CPPC_SET_PRESET_BALANCE       0x1000
+#define XEN_SYSCTL_CPPC_SET_PRESET_POWERSAVE     0x2000
+#define XEN_SYSCTL_CPPC_SET_PRESET_PERFORMANCE   0x3000
+#define XEN_SYSCTL_CPPC_SET_PARAM_MASK ( \
+                                  XEN_SYSCTL_CPPC_SET_PRESET_MASK | \
+                                  XEN_SYSCTL_CPPC_SET_DESIRED     | \
+                                  XEN_SYSCTL_CPPC_SET_ENERGY_PERF | \
+                                  XEN_SYSCTL_CPPC_SET_ACT_WINDOW  | \
+                                  XEN_SYSCTL_CPPC_SET_MINIMUM     | \
+                                  XEN_SYSCTL_CPPC_SET_MAXIMUM     )
+    uint32_t set_params; /* bitflags for valid values */
+    /* See comments in struct xen_cppc_para. */
+    uint32_t minimum;
+    uint32_t maximum;
+    uint32_t desired;
+    /*
+     * Hint to hardware for energy/performance preference.
+     * 0:   Performance
+     * 128: Balance (Default)
+     * 255: Powersaving
+     */
+    uint32_t energy_perf;
+#define XEN_SYSCTL_CPPC_ACT_WINDOW_MASK          0x03ff
+    uint32_t activity_window;
+};
+
 #define XEN_HWP_DRIVER "hwp"
 /*
  * cpufreq para name of this structure named
@@ -417,6 +473,7 @@  struct xen_sysctl_pm_op {
     #define SET_CPUFREQ_GOV            (CPUFREQ_PARA | 0x02)
     #define SET_CPUFREQ_PARA           (CPUFREQ_PARA | 0x03)
     #define GET_CPUFREQ_AVGFREQ        (CPUFREQ_PARA | 0x04)
+    #define SET_CPUFREQ_CPPC           (CPUFREQ_PARA | 0x05)
 
     /* set/reset scheduler power saving option */
     #define XEN_SYSCTL_pm_op_set_sched_opt_smt    0x21
@@ -443,6 +500,7 @@  struct xen_sysctl_pm_op {
         struct xen_get_cpufreq_para get_para;
         struct xen_set_cpufreq_gov  set_gov;
         struct xen_set_cpufreq_para set_para;
+        struct xen_set_cppc_para    set_cppc;
         uint64_aligned_t get_avgfreq;
         uint32_t                    set_sched_opt_smt;
 #define XEN_SYSCTL_CX_UNLIMITED 0xffffffff