[2/2] PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface
diff mbox series

Message ID 1762575.ER2xjzr9E1@aspire.rjw.lan
State Accepted, archived
Delegated to: Rafael Wysocki
Headers show
Series
  • PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS handling fixes and sysfs i/f
Related show

Commit Message

Rafael J. Wysocki March 21, 2019, 10:20 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The Performance and Energy Bias Hint (EPB) is expected to be set by
user space through the generic MSR interface, but that interface is
not particularly nice and there are security concerns regarding it,
so it is not always available.

For this reason, add a sysfs interface for reading and updating the
EPB, in the form of a new attribute, energy_perf_bias, located
under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
support the EPB feature.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |   18 ++++
 Documentation/admin-guide/pm/intel_epb.rst         |   27 ++++++
 arch/x86/kernel/cpu/intel_epb.c                    |   93 ++++++++++++++++++++-
 3 files changed, 134 insertions(+), 4 deletions(-)

Comments

Hannes Reinecke March 22, 2019, 9:03 a.m. UTC | #1
On 3/21/19 11:20 PM, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
> 
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   Documentation/ABI/testing/sysfs-devices-system-cpu |   18 ++++
>   Documentation/admin-guide/pm/intel_epb.rst         |   27 ++++++
>   arch/x86/kernel/cpu/intel_epb.c                    |   93 ++++++++++++++++++++-
>   3 files changed, 134 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
Borislav Petkov March 22, 2019, 2:46 p.m. UTC | #2
First of all, thanks a lot for doing that!

This is a good example for how we should convert all the /dev/msr
accessing tools.

Nitpicks below.

On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
> 
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   18 ++++
>  Documentation/admin-guide/pm/intel_epb.rst         |   27 ++++++
>  arch/x86/kernel/cpu/intel_epb.c                    |   93 ++++++++++++++++++++-
>  3 files changed, 134 insertions(+), 4 deletions(-)

...

> +static ssize_t energy_perf_bias_show(struct device *dev,
> +				     struct device_attribute *attr,
> +				     char *buf)
> +{
> +	unsigned int cpu = dev->id;
> +	u64 epb;
> +	int ret;
> +
> +	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);

That's an IPI and an MSR read each time. You could dump saved_epb
instead, no?

> +	if (ret < 0)
> +		return ret;
> +
> +	return sprintf(buf, "%llu\n", epb);
> +}
> +
> +static ssize_t energy_perf_bias_store(struct device *dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t count)
> +{
> +	unsigned int cpu = dev->id;
> +	u64 epb, val;
> +	int ret;
> +
> +	ret = __sysfs_match_string(energy_perf_strings,
> +				   ARRAY_SIZE(energy_perf_strings), buf);
> +	if (ret >= 0)
> +		val = energ_perf_values[ret];
> +	else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)

Range is 0 - 15 but u64? Maybe make it an u8? :)

> +		return -EINVAL;
> +
> +	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> +			    (epb & ~EPB_MASK) | val);
> +	if (ret < 0)
> +		return ret;
> +
> +	return count;
> +}
Peter Zijlstra March 22, 2019, 3 p.m. UTC | #3
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> +	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> +			    (epb & ~EPB_MASK) | val);

That's two back-to-back IPIs and a giant waste.

If you'd use a proper msr shadow variable, you'd not have to do the
rdmsr_on_cpu :-)
Rafael J. Wysocki March 25, 2019, 9:56 a.m. UTC | #4
On Fri, Mar 22, 2019 at 4:00 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > +     ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> > +                         (epb & ~EPB_MASK) | val);
>
> That's two back-to-back IPIs and a giant waste.

Giant with respect to what?

I know that the read can be avoidable if more MSR bits are stored in
memory, but I don't expect this i/f to be used very often (once per
boot maybe or on AC<->DC changes at most), so I didn't think that this
would be a good tradeoff.

> If you'd use a proper msr shadow variable, you'd not have to do the
> rdmsr_on_cpu :-)

Not really.

The MSR can be updated from elsewhere which is not controlled by this code.
Rafael J. Wysocki March 25, 2019, 10:01 a.m. UTC | #5
On Fri, Mar 22, 2019 at 3:46 PM Borislav Petkov <bp@alien8.de> wrote:
>
> First of all, thanks a lot for doing that!
>
> This is a good example for how we should convert all the /dev/msr
> accessing tools.
>
> Nitpicks below.
>
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The Performance and Energy Bias Hint (EPB) is expected to be set by
> > user space through the generic MSR interface, but that interface is
> > not particularly nice and there are security concerns regarding it,
> > so it is not always available.
> >
> > For this reason, add a sysfs interface for reading and updating the
> > EPB, in the form of a new attribute, energy_perf_bias, located
> > under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> > support the EPB feature.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-devices-system-cpu |   18 ++++
> >  Documentation/admin-guide/pm/intel_epb.rst         |   27 ++++++
> >  arch/x86/kernel/cpu/intel_epb.c                    |   93 ++++++++++++++++++++-
> >  3 files changed, 134 insertions(+), 4 deletions(-)
>
> ...
>
> > +static ssize_t energy_perf_bias_show(struct device *dev,
> > +                                  struct device_attribute *attr,
> > +                                  char *buf)
> > +{
> > +     unsigned int cpu = dev->id;
> > +     u64 epb;
> > +     int ret;
> > +
> > +     ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
>
> That's an IPI and an MSR read each time. You could dump saved_epb
> instead, no?

No, because the MSR can change in ways beyond control of this code sometimes.

Generally, saved_epb only is the right value at the CPU online time.

> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     return sprintf(buf, "%llu\n", epb);
> > +}
> > +
> > +static ssize_t energy_perf_bias_store(struct device *dev,
> > +                                   struct device_attribute *attr,
> > +                                   const char *buf, size_t count)
> > +{
> > +     unsigned int cpu = dev->id;
> > +     u64 epb, val;
> > +     int ret;
> > +
> > +     ret = __sysfs_match_string(energy_perf_strings,
> > +                                ARRAY_SIZE(energy_perf_strings), buf);
> > +     if (ret >= 0)
> > +             val = energ_perf_values[ret];
> > +     else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)
>
> Range is 0 - 15 but u64? Maybe make it an u8? :)

At the cost of an extra conversion below, right?

> > +             return -EINVAL;
> > +
> > +     ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
> > +                         (epb & ~EPB_MASK) | val);
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     return count;
> > +}
>
> --
Borislav Petkov March 25, 2019, 11:32 a.m. UTC | #6
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The Performance and Energy Bias Hint (EPB) is expected to be set by
> user space through the generic MSR interface, but that interface is
> not particularly nice and there are security concerns regarding it,
> so it is not always available.
> 
> For this reason, add a sysfs interface for reading and updating the
> EPB, in the form of a new attribute, energy_perf_bias, located
> under /sys/devices/system/cpu/cpu#/power/ for online CPUs that
> support the EPB feature.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   18 ++++
>  Documentation/admin-guide/pm/intel_epb.rst         |   27 ++++++
>  arch/x86/kernel/cpu/intel_epb.c                    |   93 ++++++++++++++++++++-
>  3 files changed, 134 insertions(+), 4 deletions(-)

Acked-by: Borislav Petkov <bp@suse.de>
Ido Schimmel May 9, 2019, 10:23 a.m. UTC | #7
On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> +static struct attribute *intel_epb_attrs[] = {
> +	&dev_attr_energy_perf_bias.attr,
> +	NULL
> +};
> +
> +static const struct attribute_group intel_epb_attr_group = {
> +	.name = power_group_name,
> +	.attrs =  intel_epb_attrs
> +};
> +
>  static int intel_epb_online(unsigned int cpu)
>  {
> +	struct device *cpu_dev = get_cpu_device(cpu);
> +
>  	intel_epb_restore();
> +	if (!cpuhp_tasks_frozen)
> +		sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
>  	return 0;
>  }
>  
>  static int intel_epb_offline(unsigned int cpu)
>  {
> -	return intel_epb_save();
> +	struct device *cpu_dev = get_cpu_device(cpu);
> +
> +	if (!cpuhp_tasks_frozen)
> +		sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
> +	intel_epb_save();
> +	return 0;
>  }

Hi,

I just booted net-next and got the following NULL pointer dereference
[1] during boot. I believe it is caused by this patch.

CONFIG_PM is disabled in my config which means 'power_group_name' is
defined as NULL. When I enable CONFIG_PM the issue is not reproduced.

Thanks

[1]
[    1.230241] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    1.231043] #PF: supervisor read access in kernel mode
[    1.231043] #PF: error_code(0x0000) - not-present page
[    1.231043] PGD 0 P4D 0
[    1.231043] Oops: 0000 [#1] SMP
[    1.231043] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 5.1.0-custom-07273-g80f232121b69 #1392
[    1.231043] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[    1.231043] RIP: 0010:strlen+0x0/0x20
[    1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[    1.231043] RSP: 0000:ffffb587c0cd3dc8 EFLAGS: 00010246
[    1.231043] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
[    1.231043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    1.231043] RBP: 0000000000000000 R08: ffff8e6137a160c8 R09: 0000000000000000
[    1.231043] R10: 0000000000000000 R11: ffff8e613652ec80 R12: 0000000000000000
[    1.231043] R13: 0000000000000000 R14: ffff8e6137a160c8 R15: ffffffffb4690120
[    1.231043] FS:  0000000000000000(0000) GS:ffff8e6137a00000(0000) knlGS:0000000000000000
[    1.231043] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.231043] CR2: 0000000000000000 CR3: 0000000200409000 CR4: 00000000001006f0
[    1.231043] Call Trace:
[    1.231043]  kernfs_name_hash+0xd/0x80
[    1.231043]  kernfs_find_ns+0x30/0xc0
[    1.231043]  kernfs_find_and_get_ns+0x27/0x50
[    1.231043]  sysfs_merge_group+0x2e/0x100
[    1.231043]  ? __switch_to_asm+0x40/0x70
[    1.231043]  intel_epb_online+0x2a/0x30
[    1.231043]  cpuhp_invoke_callback+0x8f/0x550
[    1.231043]  ? sort_range+0x20/0x20
[    1.231043]  cpuhp_thread_fun+0x9b/0x100
[    1.231043]  smpboot_thread_fn+0xc0/0x160
[    1.231043]  kthread+0x10d/0x130
[    1.231043]  ? __kthread_create_on_node+0x180/0x180
[    1.231043]  ret_from_fork+0x35/0x40
[    1.231043] CR2: 0000000000000000
[    1.231043] ---[ end trace c8ea60276791261c ]---
[    1.231043] RIP: 0010:strlen+0x0/0x20
[    1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[    1.231043] RSP: 0000:ffffb587c0cd3dc8 EFLAGS: 00010246
[    1.231043] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
[    1.231043] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    1.231043] RBP: 0000000000000000 R08: ffff8e6137a160c8 R09: 0000000000000000
[    1.231043] R10: 0000000000000000 R11: ffff8e613652ec80 R12: 0000000000000000
[    1.231043] R13: 0000000000000000 R14: ffff8e6137a160c8 R15: ffffffffb4690120
[    1.231043] FS:  0000000000000000(0000) GS:ffff8e6137a00000(0000) knlGS:0000000000000000
[    1.231043] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.231043] CR2: 0000000000000000 CR3: 0000000200409000 CR4: 00000000001006f0
Rafael J. Wysocki May 9, 2019, 5:18 p.m. UTC | #8
On Thursday, May 9, 2019 12:23:15 PM CEST Ido Schimmel wrote:
> On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> > +static struct attribute *intel_epb_attrs[] = {
> > +	&dev_attr_energy_perf_bias.attr,
> > +	NULL
> > +};
> > +
> > +static const struct attribute_group intel_epb_attr_group = {
> > +	.name = power_group_name,
> > +	.attrs =  intel_epb_attrs
> > +};
> > +
> >  static int intel_epb_online(unsigned int cpu)
> >  {
> > +	struct device *cpu_dev = get_cpu_device(cpu);
> > +
> >  	intel_epb_restore();
> > +	if (!cpuhp_tasks_frozen)
> > +		sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> > +
> >  	return 0;
> >  }
> >  
> >  static int intel_epb_offline(unsigned int cpu)
> >  {
> > -	return intel_epb_save();
> > +	struct device *cpu_dev = get_cpu_device(cpu);
> > +
> > +	if (!cpuhp_tasks_frozen)
> > +		sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> > +
> > +	intel_epb_save();
> > +	return 0;
> >  }
> 
> Hi,
> 
> I just booted net-next and got the following NULL pointer dereference
> [1] during boot. I believe it is caused by this patch.

I think you're right, sorry about this.

> CONFIG_PM is disabled in my config which means 'power_group_name' is
> defined as NULL. When I enable CONFIG_PM the issue is not reproduced.

So does the patch below fix it for you?

---
 arch/x86/kernel/cpu/intel_epb.c |   22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -97,6 +97,7 @@ static void intel_epb_restore(void)
 	wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
 }
 
+#ifdef CONFIG_PM
 static struct syscore_ops intel_epb_syscore_ops = {
 	.suspend = intel_epb_save,
 	.resume = intel_epb_restore,
@@ -193,6 +194,25 @@ static int intel_epb_offline(unsigned in
 	return 0;
 }
 
+static inline void register_intel_ebp_syscore_ops(void)
+{
+	register_syscore_ops(&intel_epb_syscore_ops);
+}
+#else /* !CONFIG_PM */
+static int intel_epb_online(unsigned int cpu)
+{
+	intel_epb_restore();
+	return 0;
+}
+
+static int intel_epb_offline(unsigned int cpu)
+{
+	return intel_epb_save();
+}
+
+static inline void register_intel_ebp_syscore_ops(void) {}
+#endif
+
 static __init int intel_epb_init(void)
 {
 	int ret;
@@ -206,7 +226,7 @@ static __init int intel_epb_init(void)
 	if (ret < 0)
 		goto err_out_online;
 
-	register_syscore_ops(&intel_epb_syscore_ops);
+	register_intel_ebp_syscore_ops();
 	return 0;
 
 err_out_online:
Ido Schimmel May 9, 2019, 5:43 p.m. UTC | #9
On Thu, May 09, 2019 at 07:18:28PM +0200, Rafael J. Wysocki wrote:
> So does the patch below fix it for you?

Yes. Thanks for the fix. Feel free to add my tag:

Tested-by: Ido Schimmel <idosch@mellanox.com>

Patch
diff mbox series

Index: linux-pm/arch/x86/kernel/cpu/intel_epb.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/intel_epb.c
+++ linux-pm/arch/x86/kernel/cpu/intel_epb.c
@@ -9,8 +9,12 @@ 
  */
 
 #include <linux/cpuhotplug.h>
+#include <linux/cpu.h>
+#include <linux/device.h>
 #include <linux/kernel.h>
+#include <linux/string.h>
 #include <linux/syscore_ops.h>
+#include <linux/pm.h>
 
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -20,9 +24,9 @@ 
  *
  * The Performance and Energy Bias Hint (EPB) allows software to specify its
  * preference with respect to the power-performance tradeoffs present in the
- * processor.  Generally, the EPB is expected to be set by user space through
- * the generic MSR interface (with the help of the x86_energy_perf_policy tool),
- * but there are two reasons for the kernel to touch it.
+ * processor.  Generally, the EPB is expected to be set by user space (directly
+ * via sysfs or with the help of the x86_energy_perf_policy tool), but there are
+ * two reasons for the kernel to update it.
  *
  * First, there are systems where the platform firmware resets the EPB during
  * system-wide transitions from sleep states back into the working state
@@ -52,6 +56,7 @@  static DEFINE_PER_CPU(u8, saved_epb);
 
 #define EPB_MASK	0x0fULL
 #define EPB_SAVED	0x10ULL
+#define MAX_EPB		EPB_MASK
 
 static int intel_epb_save(void)
 {
@@ -97,15 +102,95 @@  static struct syscore_ops intel_epb_sysc
 	.resume = intel_epb_restore,
 };
 
+static const char * const energy_perf_strings[] = {
+	"performance",
+	"balance-performance",
+	"normal",
+	"balance-power",
+	"power"
+};
+static const u8 energ_perf_values[] = {
+	ENERGY_PERF_BIAS_PERFORMANCE,
+	ENERGY_PERF_BIAS_BALANCE_PERFORMANCE,
+	ENERGY_PERF_BIAS_NORMAL,
+	ENERGY_PERF_BIAS_BALANCE_POWERSAVE,
+	ENERGY_PERF_BIAS_POWERSAVE
+};
+
+static ssize_t energy_perf_bias_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	unsigned int cpu = dev->id;
+	u64 epb;
+	int ret;
+
+	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
+	if (ret < 0)
+		return ret;
+
+	return sprintf(buf, "%llu\n", epb);
+}
+
+static ssize_t energy_perf_bias_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	unsigned int cpu = dev->id;
+	u64 epb, val;
+	int ret;
+
+	ret = __sysfs_match_string(energy_perf_strings,
+				   ARRAY_SIZE(energy_perf_strings), buf);
+	if (ret >= 0)
+		val = energ_perf_values[ret];
+	else if (kstrtou64(buf, 0, &val) || val > MAX_EPB)
+		return -EINVAL;
+
+	ret = rdmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS, &epb);
+	if (ret < 0)
+		return ret;
+
+	ret = wrmsrl_on_cpu(cpu, MSR_IA32_ENERGY_PERF_BIAS,
+			    (epb & ~EPB_MASK) | val);
+	if (ret < 0)
+		return ret;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(energy_perf_bias);
+
+static struct attribute *intel_epb_attrs[] = {
+	&dev_attr_energy_perf_bias.attr,
+	NULL
+};
+
+static const struct attribute_group intel_epb_attr_group = {
+	.name = power_group_name,
+	.attrs =  intel_epb_attrs
+};
+
 static int intel_epb_online(unsigned int cpu)
 {
+	struct device *cpu_dev = get_cpu_device(cpu);
+
 	intel_epb_restore();
+	if (!cpuhp_tasks_frozen)
+		sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
+
 	return 0;
 }
 
 static int intel_epb_offline(unsigned int cpu)
 {
-	return intel_epb_save();
+	struct device *cpu_dev = get_cpu_device(cpu);
+
+	if (!cpuhp_tasks_frozen)
+		sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
+
+	intel_epb_save();
+	return 0;
 }
 
 static __init int intel_epb_init(void)
Index: linux-pm/Documentation/admin-guide/pm/intel_epb.rst
===================================================================
--- linux-pm.orig/Documentation/admin-guide/pm/intel_epb.rst
+++ linux-pm/Documentation/admin-guide/pm/intel_epb.rst
@@ -4,3 +4,30 @@  Intel Performance and Energy Bias Hint
 
 .. kernel-doc:: arch/x86/kernel/cpu/intel_epb.c
    :doc: overview
+
+Intel Performance and Energy Bias Attribute in ``sysfs``
+========================================================
+
+The Intel Performance and Energy Bias Hint (EPB) value for a given (logical) CPU
+can be checked or updated through a ``sysfs`` attribute (file) under
+:file:`/sys/devices/system/cpu/cpu<N>/power/`, where the CPU number ``<N>``
+is allocated at the system initialization time:
+
+``energy_perf_bias``
+	Shows the current EPB value for the CPU in a sliding scale 0 - 15, where
+	a value of 0 corresponds to a hint preference for highest performance
+	and a value of 15 corresponds to the maximum energy savings.
+
+	In order to update the EPB value for the CPU, this attribute can be
+	written to, either with a number in the 0 - 15 sliding scale above, or
+	with one of the strings: "performance", "balance-performance", "normal",
+	"balance-power", "power" that represent values reflected by their
+	meaning.
+
+	This attribute is present for all online CPUs supporting the EPB
+	feature.
+
+Note that while the EPB interface to the processor is defined at the logical CPU
+level, the physical register backing it may be shared by multiple CPUs (for
+example, SMT siblings or cores in one package).  For this reason, updating the
+EPB value for one CPU may cause the EPB values for other CPUs to change.
Index: linux-pm/Documentation/ABI/testing/sysfs-devices-system-cpu
===================================================================
--- linux-pm.orig/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ linux-pm/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -518,3 +518,21 @@  Description:	Control Symetric Multi Thre
 
 			 If control status is "forceoff" or "notsupported" writes
 			 are rejected.
+
+What:		/sys/devices/system/cpu/cpu#/power/energy_perf_bias
+Date:		March 2019
+Contact:	linux-pm@vger.kernel.org
+Description:	Intel Energy and Performance Bias Hint (EPB)
+
+		EPB for the given CPU in a sliding scale 0 - 15, where a value
+		of 0 corresponds to a hint preference for highest performance
+		and a value of 15 corresponds to the maximum energy savings.
+
+		In order to change the EPB value for the CPU, write either
+		a number in the 0 - 15 sliding scale above, or one of the
+		strings: "performance", "balance-performance", "normal",
+		"balance-power", "power" (that represent values reflected by
+		their meaning), to this attribute.
+
+		This attribute is present for all online CPUs supporting the
+		Intel EPB feature.