diff mbox

powercap/rapl: Do not load in virtualized environments

Message ID 1463484845-9839-1-git-send-email-prarit@redhat.com (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Prarit Bhargava May 17, 2016, 11:34 a.m. UTC
intel_rapl is currently not supported in virtualized environments.  When
booting the warning message

intel_rapl: no valid rapl domains found in package 0

is output for every virtual core.

This patch stops the driver from being loaded in virtual boots.

Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Jacob Jun Pan <jacob.jun.pan@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 drivers/powercap/intel_rapl.c |    5 +++++
 1 file changed, 5 insertions(+)

Comments

Rafael J. Wysocki May 18, 2016, 12:50 a.m. UTC | #1
On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
> intel_rapl is currently not supported in virtualized environments.  When
> booting the warning message
>
> intel_rapl: no valid rapl domains found in package 0

You seem to be saying that this message is problematic for some
reason, so why is it?

> is output for every virtual core.
>
> This patch stops the driver from being loaded in virtual boots.
>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Jacob Jun Pan <jacob.jun.pan@intel.com>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: linux-pm@vger.kernel.org
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> ---
>  drivers/powercap/intel_rapl.c |    5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index f2201d42a9cd..bebfbe8acccd 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -29,6 +29,7 @@
>  #include <linux/sysfs.h>
>  #include <linux/cpu.h>
>  #include <linux/powercap.h>
> +#include <asm/hypervisor.h>
>  #include <asm/iosf_mbi.h>
>
>  #include <asm/processor.h>
> @@ -1600,6 +1601,10 @@ static int __init rapl_init(void)
>                 return -ENODEV;
>         }
>
> +       /* Intel RAPL is not supported on virtualized environments */
> +       if (x86_hyper)
> +               return -ENODEV;
> +
>         rapl_defaults = (struct rapl_defaults *)id->driver_data;
>
>         cpu_notifier_register_begin();
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2016, 9:59 a.m. UTC | #2
On 05/17/2016 08:50 PM, Rafael J. Wysocki wrote:
> On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>> intel_rapl is currently not supported in virtualized environments.  When
>> booting the warning message
>>
>> intel_rapl: no valid rapl domains found in package 0
> 
> You seem to be saying that this message is problematic for some
> reason, so why is it?

It is a loud warning message that comes up for every cpu.  The virt boot is
essentially silent *except* for the above messages.

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2016, 12:38 p.m. UTC | #3
On 05/17/2016 08:50 PM, Rafael J. Wysocki wrote:
> On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>> intel_rapl is currently not supported in virtualized environments.  When
>> booting the warning message
>>
>> intel_rapl: no valid rapl domains found in package 0
> 
> You seem to be saying that this message is problematic for some
> reason, so why is it?
> 

I thought about my previous answer and after thinking about it realized I didn't
give you enough background Rafael.  Virtual environments won't use this feature
as this is meant for restricting power consumption at the HW level.

So ... here's the situation.  Most CPU features from Intel have a CPU feature
bit (also known in some circles as cpuflags) set for them.  For example MCE has
an mce bit that is exposed in /proc/cpuinfo.  Unfortunately, for Intel RAPL
there is no bit (I don't know if someone dropped the ball or if Intel
intentionally left this feature off ... I've heard both explanations :)).

In any case the Intel RAPL driver is one of the few cpu based drivers in the
kernel that still does a x86_match_cpu() against supported CPUs.  This means for
virtual cpus which export the host cpu's cpu model number, the intel_rapl driver
will attempt to load for each cpu.

As a result the message

intel_rapl: no valid rapl domains found in package 0

is output as a *visible* error to the user for each virtual core.

The error is valid for native cpus (although over 100s of systems I can say I've
never seen the warning output on a native cpu) but it is clearly not valid for
virtual cpus *because virtualized systems don't use this feature*.

The driver shouldn't load on virt systems.  That's the bottom line here, and the
patch prevents that from happening.  Would I prefer that there were some other
mechanism to detect RAPL?  Yep.  I really really would.  But beyond mucking with
MSRs (which is definitely more complicated and awful than this simple check) I
don't see any easier method than the one I've proposed.

I really don't want to be the one who sets the precedent of abusing x86_hyper in
this way.  I know it isn't the "right" thing to do -- but I honestly do not see
a better or cleaner way out of this.

P.


>> is output for every virtual core.
>>
>> This patch stops the driver from being loaded in virtual boots.
>>
>> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Cc: Jacob Jun Pan <jacob.jun.pan@intel.com>
>> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
>> Cc: linux-pm@vger.kernel.org
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>> ---
>>  drivers/powercap/intel_rapl.c |    5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
>> index f2201d42a9cd..bebfbe8acccd 100644
>> --- a/drivers/powercap/intel_rapl.c
>> +++ b/drivers/powercap/intel_rapl.c
>> @@ -29,6 +29,7 @@
>>  #include <linux/sysfs.h>
>>  #include <linux/cpu.h>
>>  #include <linux/powercap.h>
>> +#include <asm/hypervisor.h>
>>  #include <asm/iosf_mbi.h>
>>
>>  #include <asm/processor.h>
>> @@ -1600,6 +1601,10 @@ static int __init rapl_init(void)
>>                 return -ENODEV;
>>         }
>>
>> +       /* Intel RAPL is not supported on virtualized environments */
>> +       if (x86_hyper)
>> +               return -ENODEV;
>> +
>>         rapl_defaults = (struct rapl_defaults *)id->driver_data;
>>
>>         cpu_notifier_register_begin();
>> --
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 18, 2016, 11:06 p.m. UTC | #4
On Wed, May 18, 2016 at 2:38 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>
>
> On 05/17/2016 08:50 PM, Rafael J. Wysocki wrote:
>> On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>>> intel_rapl is currently not supported in virtualized environments.  When
>>> booting the warning message
>>>
>>> intel_rapl: no valid rapl domains found in package 0
>>
>> You seem to be saying that this message is problematic for some
>> reason, so why is it?
>>
>
> I thought about my previous answer and after thinking about it realized I didn't
> give you enough background Rafael.  Virtual environments won't use this feature
> as this is meant for restricting power consumption at the HW level.
>
> So ... here's the situation.  Most CPU features from Intel have a CPU feature
> bit (also known in some circles as cpuflags) set for them.  For example MCE has
> an mce bit that is exposed in /proc/cpuinfo.  Unfortunately, for Intel RAPL
> there is no bit (I don't know if someone dropped the ball or if Intel
> intentionally left this feature off ... I've heard both explanations :)).
>
> In any case the Intel RAPL driver is one of the few cpu based drivers in the
> kernel that still does a x86_match_cpu() against supported CPUs.  This means for
> virtual cpus which export the host cpu's cpu model number, the intel_rapl driver
> will attempt to load for each cpu.
>
> As a result the message
>
> intel_rapl: no valid rapl domains found in package 0
>
> is output as a *visible* error to the user for each virtual core.
>
> The error is valid for native cpus (although over 100s of systems I can say I've
> never seen the warning output on a native cpu) but it is clearly not valid for
> virtual cpus *because virtualized systems don't use this feature*.
>
> The driver shouldn't load on virt systems.  That's the bottom line here, and the
> patch prevents that from happening.  Would I prefer that there were some other
> mechanism to detect RAPL?  Yep.  I really really would.  But beyond mucking with
> MSRs (which is definitely more complicated and awful than this simple check) I
> don't see any easier method than the one I've proposed.
>
> I really don't want to be the one who sets the precedent of abusing x86_hyper in
> this way.  I know it isn't the "right" thing to do -- but I honestly do not see
> a better or cleaner way out of this.

One quite obvious alternative might be to reduce the log level of the
message in question, say to pr_debug.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava May 18, 2016, 11:36 p.m. UTC | #5
On 05/18/2016 07:06 PM, Rafael J. Wysocki wrote:
> On Wed, May 18, 2016 at 2:38 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>>
>>
>> On 05/17/2016 08:50 PM, Rafael J. Wysocki wrote:
>>> On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>>>> intel_rapl is currently not supported in virtualized environments.  When
>>>> booting the warning message
>>>>
>>>> intel_rapl: no valid rapl domains found in package 0
>>>
>>> You seem to be saying that this message is problematic for some
>>> reason, so why is it?
>>>
>>
>> I thought about my previous answer and after thinking about it realized I didn't
>> give you enough background Rafael.  Virtual environments won't use this feature
>> as this is meant for restricting power consumption at the HW level.
>>
>> So ... here's the situation.  Most CPU features from Intel have a CPU feature
>> bit (also known in some circles as cpuflags) set for them.  For example MCE has
>> an mce bit that is exposed in /proc/cpuinfo.  Unfortunately, for Intel RAPL
>> there is no bit (I don't know if someone dropped the ball or if Intel
>> intentionally left this feature off ... I've heard both explanations :)).
>>
>> In any case the Intel RAPL driver is one of the few cpu based drivers in the
>> kernel that still does a x86_match_cpu() against supported CPUs.  This means for
>> virtual cpus which export the host cpu's cpu model number, the intel_rapl driver
>> will attempt to load for each cpu.
>>
>> As a result the message
>>
>> intel_rapl: no valid rapl domains found in package 0
>>
>> is output as a *visible* error to the user for each virtual core.
>>
>> The error is valid for native cpus (although over 100s of systems I can say I've
>> never seen the warning output on a native cpu) but it is clearly not valid for
>> virtual cpus *because virtualized systems don't use this feature*.
>>
>> The driver shouldn't load on virt systems.  That's the bottom line here, and the
>> patch prevents that from happening.  Would I prefer that there were some other
>> mechanism to detect RAPL?  Yep.  I really really would.  But beyond mucking with
>> MSRs (which is definitely more complicated and awful than this simple check) I
>> don't see any easier method than the one I've proposed.
>>
>> I really don't want to be the one who sets the precedent of abusing x86_hyper in
>> this way.  I know it isn't the "right" thing to do -- but I honestly do not see
>> a better or cleaner way out of this.
> 
> One quite obvious alternative might be to reduce the log level of the
> message in question, say to pr_debug.

Yeah -- I thought about that too.  But that's really a band-aid, isn't it?  The
code shouldn't execute on virt.

P.

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki May 18, 2016, 11:47 p.m. UTC | #6
On Thu, May 19, 2016 at 1:36 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>
>
> On 05/18/2016 07:06 PM, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:38 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>>>
>>>
>>> On 05/17/2016 08:50 PM, Rafael J. Wysocki wrote:
>>>> On Tue, May 17, 2016 at 1:34 PM, Prarit Bhargava <prarit@redhat.com> wrote:
>>>>> intel_rapl is currently not supported in virtualized environments.  When
>>>>> booting the warning message
>>>>>
>>>>> intel_rapl: no valid rapl domains found in package 0
>>>>
>>>> You seem to be saying that this message is problematic for some
>>>> reason, so why is it?
>>>>
>>>
>>> I thought about my previous answer and after thinking about it realized I didn't
>>> give you enough background Rafael.  Virtual environments won't use this feature
>>> as this is meant for restricting power consumption at the HW level.
>>>
>>> So ... here's the situation.  Most CPU features from Intel have a CPU feature
>>> bit (also known in some circles as cpuflags) set for them.  For example MCE has
>>> an mce bit that is exposed in /proc/cpuinfo.  Unfortunately, for Intel RAPL
>>> there is no bit (I don't know if someone dropped the ball or if Intel
>>> intentionally left this feature off ... I've heard both explanations :)).
>>>
>>> In any case the Intel RAPL driver is one of the few cpu based drivers in the
>>> kernel that still does a x86_match_cpu() against supported CPUs.  This means for
>>> virtual cpus which export the host cpu's cpu model number, the intel_rapl driver
>>> will attempt to load for each cpu.
>>>
>>> As a result the message
>>>
>>> intel_rapl: no valid rapl domains found in package 0
>>>
>>> is output as a *visible* error to the user for each virtual core.
>>>
>>> The error is valid for native cpus (although over 100s of systems I can say I've
>>> never seen the warning output on a native cpu) but it is clearly not valid for
>>> virtual cpus *because virtualized systems don't use this feature*.
>>>
>>> The driver shouldn't load on virt systems.  That's the bottom line here, and the
>>> patch prevents that from happening.  Would I prefer that there were some other
>>> mechanism to detect RAPL?  Yep.  I really really would.  But beyond mucking with
>>> MSRs (which is definitely more complicated and awful than this simple check) I
>>> don't see any easier method than the one I've proposed.
>>>
>>> I really don't want to be the one who sets the precedent of abusing x86_hyper in
>>> this way.  I know it isn't the "right" thing to do -- but I honestly do not see
>>> a better or cleaner way out of this.
>>
>> One quite obvious alternative might be to reduce the log level of the
>> message in question, say to pr_debug.
>
> Yeah -- I thought about that too.  But that's really a band-aid, isn't it?  The
> code shouldn't execute on virt.

"It is not necessary to run that code on virt" would be a better way
to put it IMO.

That is a valid point, but then using x86_hyper to avoid that feels
beyond ugly to be honest.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index f2201d42a9cd..bebfbe8acccd 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -29,6 +29,7 @@ 
 #include <linux/sysfs.h>
 #include <linux/cpu.h>
 #include <linux/powercap.h>
+#include <asm/hypervisor.h>
 #include <asm/iosf_mbi.h>
 
 #include <asm/processor.h>
@@ -1600,6 +1601,10 @@  static int __init rapl_init(void)
 		return -ENODEV;
 	}
 
+	/* Intel RAPL is not supported on virtualized environments */
+	if (x86_hyper)
+		return -ENODEV;
+
 	rapl_defaults = (struct rapl_defaults *)id->driver_data;
 
 	cpu_notifier_register_begin();