diff mbox series

kvm: x86: Print "disabled by bios" only once per host

Message ID 20200214143035.607115-1-e.velu@criteo.com (mailing list archive)
State New, archived
Headers show
Series kvm: x86: Print "disabled by bios" only once per host | expand

Commit Message

Erwan Velu Feb. 14, 2020, 2:30 p.m. UTC
The current behavior is to print a "disabled by bios" message per CPU thread.
As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
printk to be a pretty noisy by showing up to 256 times the same line in a row.

This patch offer to only print the message once per host considering the BIOS will
disabled the feature for all sockets/cores at once and not on a per core basis.

Signed-off-by: Erwan Velu <e.velu@criteo.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Sean Christopherson Feb. 14, 2020, 5:05 p.m. UTC | #1
On Fri, Feb 14, 2020 at 03:30:35PM +0100, Erwan Velu wrote:
> The current behavior is to print a "disabled by bios" message per CPU thread.
> As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
> printk to be a pretty noisy by showing up to 256 times the same line in a row.
> 
> This patch offer to only print the message once per host considering the BIOS will
> disabled the feature for all sockets/cores at once and not on a per core basis.

This has come up before[*].  Using _once() doesn't fully solve the issue
when KVM is built as a module.  The spam is more than likely a userspace
bug, i.e. userspace is probing KVM on every CPU.

[*] https://lkml.kernel.org/r/20190826182320.9089-1-tony.luck@intel.com

> Signed-off-by: Erwan Velu <e.velu@criteo.com>
> ---
>  arch/x86/kvm/x86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fbabb2f06273..8f0d7a09d453 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7300,7 +7300,7 @@ int kvm_arch_init(void *opaque)
>  		goto out;
>  	}
>  	if (ops->disabled_by_bios()) {
> -		printk(KERN_ERR "kvm: disabled by bios\n");
> +		printk_once(KERN_ERR "kvm: disabled by bios\n");
>  		r = -EOPNOTSUPP;
>  		goto out;
>  	}
> -- 
> 2.24.1
>
Jim Mattson Feb. 14, 2020, 5:43 p.m. UTC | #2
On Fri, Feb 14, 2020 at 6:33 AM Erwan Velu <erwanaliasr1@gmail.com> wrote:
>
> The current behavior is to print a "disabled by bios" message per CPU thread.
> As modern CPUs can have up to 64 cores, 128 on a dual socket, and turns this
> printk to be a pretty noisy by showing up to 256 times the same line in a row.
>
> This patch offer to only print the message once per host considering the BIOS will
> disabled the feature for all sockets/cores at once and not on a per core basis.

That's quite an assumption you're making. I guess I've seen more
broken BIOSes than you have. :-) Still, I would rather see the message
just once--perhaps with an additional warning if all logical
processors aren't in agreement.
Erwan Velu Feb. 18, 2020, 4:28 p.m. UTC | #3
On 14/02/2020 18:05, Sean Christopherson wrote:
> This has come up before[*].  Using _once() doesn't fully solve the issue
> when KVM is built as a module.  The spam is more than likely a userspace
> bug, i.e. userspace is probing KVM on every CPU.

I made some progress on this.


That's "/usr/bin/udevadm trigger --type=devices --action=add" the culprit.

It does echo "add" in /sys/devices/system/cpu/cpu<x>/uevent

For the each cpu, it does the 'add' which trigger the "disabled by bios" 
message from kvm_arch_init.

Note that doing a "add" on the same processor will trigger the same 
message at every "add" event.


So I tried the patch of using pr_err_once() instead of printk() and the 
behavior is fine : despite the number of "add" generated, there is a 
single line being printed out.

Without the patch, every "add" generates the "disabled by bios" message.


So the question is : do we want to handle the case where a possible bios 
missed the configuration of some cores ?

If no, then the patch is fine and could be submitted. I don't see the 
need of printing this message at every call as it pollute the kernel log.

If yes, then we need to keep a trace of the number of enabled/disabled 
cores so we can report a mismatch. As this message seems printed per 
cpu, that would kind of mean a global variable right ?


What are your recommendations on this ?


Erwan,
Sean Christopherson Feb. 18, 2020, 6:48 p.m. UTC | #4
On Tue, Feb 18, 2020 at 05:28:49PM +0100, Erwan Velu wrote:
> On 14/02/2020 18:05, Sean Christopherson wrote:
> >This has come up before[*].  Using _once() doesn't fully solve the issue
> >when KVM is built as a module.  The spam is more than likely a userspace
> >bug, i.e. userspace is probing KVM on every CPU.
> 
> I made some progress on this.
> 
> 
> That's "/usr/bin/udevadm trigger --type=devices --action=add" the culprit.
> 
> It does echo "add" in /sys/devices/system/cpu/cpu<x>/uevent
> 
> For the each cpu, it does the 'add' which trigger the "disabled by bios"
> message from kvm_arch_init.
> 
> Note that doing a "add" on the same processor will trigger the same message
> at every "add" event.
> 
> 
> So I tried the patch of using pr_err_once() instead of printk() and the
> behavior is fine : despite the number of "add" generated, there is a single
> line being printed out.
> 
> Without the patch, every "add" generates the "disabled by bios" message.

That's a sort of unintentional side effect of KVM being split into two
modules, kvm and kvm_{intel,amd}.  E.g. if userspace did 'rmmod kvm' on
failure of 'modprobe kvm_intel' then using _*once() would be ineffective.
 
> So the question is : do we want to handle the case where a possible bios
> missed the configuration of some cores ?

That's a question for AMD/SVM.  Starting with kernel 5.6, Intel/VMX checks
for BIOS enabling on all CPUs.

That being said, checking for correct BIOS configuration on all CPUs is
orthogonal to this print statement issue.  Probing kvm_intel on every CPU
doesn't do anything to address a misoncifgured BIOS, e.g. if VMX/SVM is
fully supported on CPU0 then additional probes of kvm_{intel,amd} are nops,
they don't actually check for support on other CPUs.

> If no, then the patch is fine and could be submitted. I don't see the need
> of printing this message at every call as it pollute the kernel log.
> 
> If yes, then we need to keep a trace of the number of enabled/disabled cores
> so we can report a mismatch. As this message seems printed per cpu, that
> would kind of mean a global variable right ?
>
> What are your recommendations on this ?

Fix userspace to only do the "add" on one CPU.

Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
case "works", but it's effectively a hack to workaround a flawed userspace.
Erwan Velu Feb. 19, 2020, 11:19 a.m. UTC | #5
On 18/02/2020 19:48, Sean Christopherson wrote:
[...]
> Fix userspace to only do the "add" on one CPU.
>
> Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
> case "works", but it's effectively a hack to workaround a flawed userspace.

I'll see with the user space tool to sort this out.

I'm also considering how "wrong" is what they do: udevadm trigger is 
generating 3500  "uevent add" on my system and I only noticed kvm to 
print this noisy message.

So as the print once isn't that "wrong" neither, this simple patch would 
avoid polluting the kernel logs.


So my proposal would be

- have this simple patch on the kernel side to avoid having userspace 
apps polluting logs

- contacting the udev people to see the rational & fix it too : I'll do that


As you said, once probed, there is no need reprinting the same message 
again as the situation cannot have changed.

As we are on the preliminary return code path (to make a EOPNOTSUPP), I 
would vote for protecting the print against multiple calls/prints.

The kernel patch isn't impacting anyone (in a regular case) and just 
avoid pollution.


Would you agree on that ?

Erwan,
Erwan Velu Feb. 19, 2020, 11:32 a.m. UTC | #6
On 19/02/2020 12:19, Erwan Velu wrote:
[...]
> - contacting the udev people to see the rational & fix it too : I'll 
> do that 

I created https://github.com/systemd/systemd/issues/14906, if some want 
to participate/follow the discussions with systemd.

Erwan,
Sean Christopherson Feb. 19, 2020, 4:18 p.m. UTC | #7
On Wed, Feb 19, 2020 at 12:19:01PM +0100, Erwan Velu wrote:
> On 18/02/2020 19:48, Sean Christopherson wrote:
> [...]
> >Fix userspace to only do the "add" on one CPU.
> >
> >Changing kvm_arch_init() to use pr_err_once() for the disabled_by_bios()
> >case "works", but it's effectively a hack to workaround a flawed userspace.
> 
> I'll see with the user space tool to sort this out.
> 
> I'm also considering how "wrong" is what they do: udevadm trigger is
> generating 3500  "uevent add" on my system and I only noticed kvm to print
> this noisy message.
> 
> So as the print once isn't that "wrong" neither, this simple patch would
> avoid polluting the kernel logs.
> 
> 
> So my proposal would be
> 
> - have this simple patch on the kernel side to avoid having userspace apps
> polluting logs
> 
> - contacting the udev people to see the rational & fix it too : I'll do that
> 
> 
> As you said, once probed, there is no need reprinting the same message again
> as the situation cannot have changed.

For this exact scenario, on Intel/VMX, this is mostly true.  But, the MSR
check for AMD/SVM has a disable bit that takes effect irrespective of the
MSR's locked bit, i.e. SVM could theoretically change state without any
super special behavior.

Even on Intel, the state can potentially change, especially on a system
with a misbehaving BIOS.  FEATURE_CONTROL is cleared on CPU RESET, e.g. VMX
enabling could change if BIOS "forgets" to reinitialize the MSR upon waking
from S3 (suspend).  Things get really weird if we consider the case where
BIOS leaves the MSR unlocked after S3, the user manually writes the MSR,
and then it gets cleared again on a different S3 transition.

SVM is even more sensitive because VM_CR is cleared on INIT, not just RESET.

> As we are on the preliminary return code path (to make a EOPNOTSUPP), I
> would vote for protecting the print against multiple calls/prints.
> 
> The kernel patch isn't impacting anyone (in a regular case) and just avoid
> pollution.
> 
> Would you agree on that ?

Sadly, no.  Don't get me wrong, I completely agree that, ideally, KVM would
not spam the log, even when presented with a misbehaving userspace.

My hesitation about changing the error message to pr_err_once() isn't so
much about right versus wrong as it is about creating misleading and
potentially confusing code in KVM, and setting a precedent that I don't
think we want to carry forward.

E.g. the _once() doesn't hold true if module kvm is unloaded and other
error messages such as basic CPU support would still be unlimited.  The
basic CPU support message definitely should *not* be _once() as that would
squash messages when loading the wrong vendor module.

As for setting a precedent, moving the error message to the vendor module
or making kvm a monolithic module would "break" the _once() behavior.

And, the current systemd behavior is actually quite dangerous, e.g. on a
misconfigured system where SVM is enabled on a subset of CPUs, probing KVM
on every CPU essentially guarantees that KVM will be loaded on a broken
system.  In that case, I think we actually want the spam.  Note, as of
kernel 5.6, this doesn't apply to VMX as kvm_intel will no longer load on a
misconfigured system since FEATURE_CONTROL configuration is incorporated
into the per-CPU checks.

All of that being said, what about converting all of the error messages to
pr_err_ratelimited()?  That would take the edge off this particular problem,
wouldn't create incosistencies between error messages, and won't completely
squash error messages in corner case scenarios on misconfigured systems.
Erwan Velu Feb. 19, 2020, 4:53 p.m. UTC | #8
[..]

On 19/02/2020 17:18, Sean Christopherson wrote:
> All of that being said, what about converting all of the error messages to
> pr_err_ratelimited()?  That would take the edge off this particular problem,
> wouldn't create incosistencies between error messages, and won't completely
> squash error messages in corner case scenarios on misconfigured systems.

Thanks Sean for your very detailed answer.

I've been testing the ratelimited which is far better but still prints 
12 messages.

I saw the ratelimit is on about 5 sec, I wonder if we can explicit a 
longer one for this one.

I searched around this but it doesn't seems that hacking the delay is a 
common usage.

Do you have any insights/ideas around that ?


Switching to ratelimit could be done by replacing the actual call or add 
a macro similar to  kvm_pr_unimpl() so it can be reused easily.

I can offer the patches for that.


Erwan,
Erwan Velu Feb. 19, 2020, 5:42 p.m. UTC | #9
On 19/02/2020 12:32, Erwan Velu wrote:
> On 19/02/2020 12:19, Erwan Velu wrote:
> [...]
>> - contacting the udev people to see the rational & fix it too : I'll 
>> do that 
>
> I created https://github.com/systemd/systemd/issues/14906, if some 
> want to participate/follow the discussions with systemd.
>
It appear that its fixed on recent systemd, I tried at least starting 
from v239 this is fixed. I didn't tried earlier releases as they are 
painful to build.

To me that doesn't means that the ratelimited printing have no interest 
for all existing systems unless we weren't at least two reporting this ;)

Erwan,
Paolo Bonzini Feb. 19, 2020, 5:51 p.m. UTC | #10
On 19/02/20 17:53, Erwan Velu wrote:
> 
> 
> I've been testing the ratelimited which is far better but still prints
> 12 messages.

12 is already much better than 256.  Someone else will have an even
bigger system requiring a larger delay, so I'd go with the default.

Paolo

> I saw the ratelimit is on about 5 sec, I wonder if we can explicit a
> longer one for this one.
> 
> I searched around this but it doesn't seems that hacking the delay is a
> common usage.
> 
> Do you have any insights/ideas around that ?
> 
> 
> Switching to ratelimit could be done by replacing the actual call or add
> a macro similar to  kvm_pr_unimpl() so it can be reused easily.
Erwan Velu Feb. 19, 2020, 9:52 p.m. UTC | #11
I'll send a patch in this direction.
Thanks,

Le mer. 19 févr. 2020 à 18:51, Paolo Bonzini <pbonzini@redhat.com> a écrit :
>
> On 19/02/20 17:53, Erwan Velu wrote:
> >
> >
> > I've been testing the ratelimited which is far better but still prints
> > 12 messages.
>
> 12 is already much better than 256.  Someone else will have an even
> bigger system requiring a larger delay, so I'd go with the default.
>
> Paolo
>
> > I saw the ratelimit is on about 5 sec, I wonder if we can explicit a
> > longer one for this one.
> >
> > I searched around this but it doesn't seems that hacking the delay is a
> > common usage.
> >
> > Do you have any insights/ideas around that ?
> >
> >
> > Switching to ratelimit could be done by replacing the actual call or add
> > a macro similar to  kvm_pr_unimpl() so it can be reused easily.
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fbabb2f06273..8f0d7a09d453 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7300,7 +7300,7 @@  int kvm_arch_init(void *opaque)
 		goto out;
 	}
 	if (ops->disabled_by_bios()) {
-		printk(KERN_ERR "kvm: disabled by bios\n");
+		printk_once(KERN_ERR "kvm: disabled by bios\n");
 		r = -EOPNOTSUPP;
 		goto out;
 	}