diff mbox series

[1/2] KVM: X86: Move ignore_msrs handling upper the stack

Message ID 20200622220442.21998-2-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series KVM: X86: A few fixes around ignore_msrs | expand

Commit Message

Peter Xu June 22, 2020, 10:04 p.m. UTC
MSR accesses can be one of:

  (1) KVM internal access,
  (2) userspace access (e.g., via KVM_SET_MSRS ioctl),
  (3) guest access.

The ignore_msrs was previously handled by kvm_get_msr_common() and
kvm_set_msr_common(), which is the bottom of the msr access stack.  It's
working in most cases, however it could dump unwanted warning messages to dmesg
even if kvm get/set the msrs internally when calling __kvm_set_msr() or
__kvm_get_msr() (e.g. kvm_cpuid()).  Ideally we only want to trap cases (2)
or (3), but not (1) above.

To achieve this, move the ignore_msrs handling upper until the callers of
__kvm_get_msr() and __kvm_set_msr().  To identify the "msr missing" event, a
new return value (KVM_MSR_RET_INVALID==2) is used for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c | 80 +++++++++++++++++++++++++++++++---------------
 arch/x86/kvm/x86.h |  2 ++
 2 files changed, 56 insertions(+), 26 deletions(-)

Comments

Sean Christopherson June 25, 2020, 6:15 a.m. UTC | #1
On Mon, Jun 22, 2020 at 06:04:41PM -0400, Peter Xu wrote:
> MSR accesses can be one of:
> 
>   (1) KVM internal access,
>   (2) userspace access (e.g., via KVM_SET_MSRS ioctl),
>   (3) guest access.
> 
> The ignore_msrs was previously handled by kvm_get_msr_common() and
> kvm_set_msr_common(), which is the bottom of the msr access stack.  It's
> working in most cases, however it could dump unwanted warning messages to dmesg
> even if kvm get/set the msrs internally when calling __kvm_set_msr() or
> __kvm_get_msr() (e.g. kvm_cpuid()).  Ideally we only want to trap cases (2)
> or (3), but not (1) above.
> 
> To achieve this, move the ignore_msrs handling upper until the callers of
> __kvm_get_msr() and __kvm_set_msr().  To identify the "msr missing" event, a
> new return value (KVM_MSR_RET_INVALID==2) is used for that.

IMO, kvm_cpuid() is simply buggy.  If KVM attempts to access a non-existent
MSR then it darn well should warn.

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8a294f9747aa..7ef7283011d6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
                *ebx = entry->ebx;
                *ecx = entry->ecx;
                *edx = entry->edx;
-               if (function == 7 && index == 0) {
+               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) &&
+                   (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) {
                        u64 data;
                        if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
                            (data & TSX_CTRL_CPUID_CLEAR))
Paolo Bonzini June 25, 2020, 8:09 a.m. UTC | #2
On 25/06/20 08:15, Sean Christopherson wrote:
> IMO, kvm_cpuid() is simply buggy.  If KVM attempts to access a non-existent
> MSR then it darn well should warn.
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 8a294f9747aa..7ef7283011d6 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>                 *ebx = entry->ebx;
>                 *ecx = entry->ecx;
>                 *edx = entry->edx;
> -               if (function == 7 && index == 0) {
> +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) &&
> +                   (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) {
>                         u64 data;
>                         if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
>                             (data & TSX_CTRL_CPUID_CLEAR))
> 

That works too, but I disagree that warning is the correct behavior
here.  It certainly should warn as long as kvm_get_msr blindly returns
zero.  However, for a guest it's fine to access a potentially
non-existent MSR if you're ready to trap the #GP, and the point of this
series is to let cpuid.c or any other KVM code do the same.

Paolo
Sean Christopherson June 25, 2020, 4:25 p.m. UTC | #3
On Thu, Jun 25, 2020 at 10:09:13AM +0200, Paolo Bonzini wrote:
> On 25/06/20 08:15, Sean Christopherson wrote:
> > IMO, kvm_cpuid() is simply buggy.  If KVM attempts to access a non-existent
> > MSR then it darn well should warn.
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 8a294f9747aa..7ef7283011d6 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >                 *ebx = entry->ebx;
> >                 *ecx = entry->ecx;
> >                 *edx = entry->edx;
> > -               if (function == 7 && index == 0) {
> > +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) &&
> > +                   (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) {
> >                         u64 data;
> >                         if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> >                             (data & TSX_CTRL_CPUID_CLEAR))
> > 
> 
> That works too, but I disagree that warning is the correct behavior
> here.  It certainly should warn as long as kvm_get_msr blindly returns
> zero.  However, for a guest it's fine to access a potentially
> non-existent MSR if you're ready to trap the #GP, and the point of this
> series is to let cpuid.c or any other KVM code do the same.

I get the "what" of the change, and even the "why" to some extent, but I
dislike the idea of supporting/encouraging blind reads/writes to MSRs.
Blind writes are just asking for problems, and suppressing warnings on reads
is almost guaranteed to be suppressing a KVM bug.

Case in point, looking at the TSX thing again, I actually think the fix
should be:

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5eb618dbf211..64322446e590 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
                *ebx = entry->ebx;
                *ecx = entry->ecx;
                *edx = entry->edx;
-               if (function == 7 && index == 0) {
+               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
                        u64 data;
-                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
+                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
                            (data & TSX_CTRL_CPUID_CLEAR))
                                *ebx &= ~(F(RTM) | F(HLE));
                }


On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
regardless of whether or not it is being advertised to userspace (this is
a bug in its own right).  Using the host_initiated variant means KVM will
incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
without advertising it to the guest.

In short, the whole MSR_IA32_TSX_CTRL implementation seems messy and this
is just papering over that mess.  The correct fix is to invoke setup_msrs()
on writes to MSR_IA32_ARCH_CAPABILITIES, filtering MSR_IA32_TSX_CTRL out of
shared MSRs when it's not advertised, and change kvm_cpuid() to use the
unpriveleged variant.

TSC_CTRL aside, if we insist on pointing a gun at our foot at some point,
this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated,
so that it at least requires intentionally loading the gun.
Sean Christopherson June 25, 2020, 5:45 p.m. UTC | #4
On Thu, Jun 25, 2020 at 09:25:40AM -0700, Sean Christopherson wrote:
> On Thu, Jun 25, 2020 at 10:09:13AM +0200, Paolo Bonzini wrote:
> > On 25/06/20 08:15, Sean Christopherson wrote:
> > > IMO, kvm_cpuid() is simply buggy.  If KVM attempts to access a non-existent
> > > MSR then it darn well should warn.
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 8a294f9747aa..7ef7283011d6 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> > >                 *ebx = entry->ebx;
> > >                 *ecx = entry->ecx;
> > >                 *edx = entry->edx;
> > > -               if (function == 7 && index == 0) {
> > > +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) &&
> > > +                   (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) {
> > >                         u64 data;
> > >                         if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> > >                             (data & TSX_CTRL_CPUID_CLEAR))
> > > 
> > 
> > That works too, but I disagree that warning is the correct behavior
> > here.  It certainly should warn as long as kvm_get_msr blindly returns
> > zero.  However, for a guest it's fine to access a potentially
> > non-existent MSR if you're ready to trap the #GP, and the point of this
> > series is to let cpuid.c or any other KVM code do the same.
> 
> I get the "what" of the change, and even the "why" to some extent, but I
> dislike the idea of supporting/encouraging blind reads/writes to MSRs.
> Blind writes are just asking for problems, and suppressing warnings on reads
> is almost guaranteed to be suppressing a KVM bug.
> 
> Case in point, looking at the TSX thing again, I actually think the fix
> should be:
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5eb618dbf211..64322446e590 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>                 *ebx = entry->ebx;
>                 *ecx = entry->ecx;
>                 *edx = entry->edx;
> -               if (function == 7 && index == 0) {
> +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
>                         u64 data;
> -                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> +                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
>                             (data & TSX_CTRL_CPUID_CLEAR))
>                                 *ebx &= ~(F(RTM) | F(HLE));
>                 }
> 
> 
> On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
> regardless of whether or not it is being advertised to userspace (this is
> a bug in its own right).  Using the host_initiated variant means KVM will
> incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
> clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
> without advertising it to the guest.

Argh, belatedly realized that MSR_IA32_TSX_CTRL needs to be swapped even
when ARCH_CAP_TSX_CTRL_MSR isn't exposed to the guest, but if and only if
if TSX is disabled in the host _and_ enabled in the guest.  So triggering
setup_msrs() on ARCH_CAP_TSX_CTRL_MSR is insufficient, but I believe we can
and should redo setup_msrs() during vmx_cpuid_update().  I'm pretty sure
that's needed for MSR_TSC_AUX+RDTSCP as well.  I suspect RDTSCP is broken
on 32-bit guests, but no has noticed because Linux only employs RDTSCP on
64-bit kernels, and 32-bit guests are exactly common in the first place.

I'll check the above to confirm and prep some patches if RDTSCP is indeed
busted.

> In short, the whole MSR_IA32_TSX_CTRL implementation seems messy and this
> is just papering over that mess.  The correct fix is to invoke setup_msrs()
> on writes to MSR_IA32_ARCH_CAPABILITIES, filtering MSR_IA32_TSX_CTRL out of
> shared MSRs when it's not advertised, and change kvm_cpuid() to use the
> unpriveleged variant.
> 
> TSC_CTRL aside, if we insist on pointing a gun at our foot at some point,
> this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated,
> so that it at least requires intentionally loading the gun.
Paolo Bonzini June 25, 2020, 6:44 p.m. UTC | #5
On 25/06/20 18:25, Sean Christopherson wrote:
> I get the "what" of the change, and even the "why" to some extent, but I
> dislike the idea of supporting/encouraging blind reads/writes to MSRs.
> Blind writes are just asking for problems, and suppressing warnings on reads
> is almost guaranteed to be suppressing a KVM bug.

Right, that's why this patch does not just suppress warnings: it adds a
different return value to detect the case.

> TSC_CTRL aside, if we insist on pointing a gun at our foot at some point,
> this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated,
> so that it at least requires intentionally loading the gun.

With this patch, __kvm_get_msr does not know about ignore_msrs at all,
that seems to be strictly an improvement; do you agree with that?  What
would you think about adding warn_unused_result to __kvm_get_msr?

Paolo
Sean Christopherson June 26, 2020, 3:56 p.m. UTC | #6
On Thu, Jun 25, 2020 at 08:44:16PM +0200, Paolo Bonzini wrote:
> On 25/06/20 18:25, Sean Christopherson wrote:
> > I get the "what" of the change, and even the "why" to some extent, but I
> > dislike the idea of supporting/encouraging blind reads/writes to MSRs.
> > Blind writes are just asking for problems, and suppressing warnings on reads
> > is almost guaranteed to be suppressing a KVM bug.
> 
> Right, that's why this patch does not just suppress warnings: it adds a
> different return value to detect the case.
> 
> > TSC_CTRL aside, if we insist on pointing a gun at our foot at some point,
> > this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated,
> > so that it at least requires intentionally loading the gun.
> 
> With this patch, __kvm_get_msr does not know about ignore_msrs at all,
> that seems to be strictly an improvement; do you agree with that?

Not really?  It's solving a problem that doesn't exist in the current code
base (assuming TSC_CTRL is fixed), and IMO solving it in an ugly fashion.

I would much prefer that, _if_ we want to support blind KVM-internal MSR
accesses, we end up with code like:

	if (msr_info->kvm_internal) {
		return 1;
	} else if (!ignore_msrs) {
		vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
			    msr, data);
		return 1;
	} else {
		if (report_ignored_msrs)
			vcpu_unimpl(vcpu,
				"ignored wrmsr: 0x%x data 0x%llx\n",
				msr, data);
		break;
	}

But I'm still not convinced that there is a legimiate scenario for setting
kvm_internal=true.

> What would you think about adding warn_unused_result to __kvm_get_msr?

I guess I wouldn't object to it, but that seems like an orthogonal issue.
Peter Xu June 26, 2020, 5:37 p.m. UTC | #7
On Fri, Jun 26, 2020 at 08:56:57AM -0700, Sean Christopherson wrote:
> Not really?  It's solving a problem that doesn't exist in the current code
> base (assuming TSC_CTRL is fixed), and IMO solving it in an ugly fashion.
> 
> I would much prefer that, _if_ we want to support blind KVM-internal MSR
> accesses, we end up with code like:
> 
> 	if (msr_info->kvm_internal) {
> 		return 1;
> 	} else if (!ignore_msrs) {
> 		vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
> 			    msr, data);
> 		return 1;
> 	} else {
> 		if (report_ignored_msrs)
> 			vcpu_unimpl(vcpu,
> 				"ignored wrmsr: 0x%x data 0x%llx\n",
> 				msr, data);
> 		break;
> 	}
> 
> But I'm still not convinced that there is a legimiate scenario for setting
> kvm_internal=true.

Actually this really looks like my initial version when I was discussing this
with Paolo before this version, but Paolo suggested what I implemented last.  I
think I agree with Paolo that it's an improvement to have a way to get/set real
msr value so that we don't need to further think about effects being taken with
the two tricky msr knobs (report_ignored_msrs, ignore_msrs).  These knobs are
even trickier to me when they're hidden deep, because they are not easily
expected when seeing the name of the functions (e.g. __kvm_set_msr, rather than
__kvm_set_msr_retval_fixed).

Thanks,
Sean Christopherson June 26, 2020, 5:46 p.m. UTC | #8
On Fri, Jun 26, 2020 at 01:37:50PM -0400, Peter Xu wrote:
> On Fri, Jun 26, 2020 at 08:56:57AM -0700, Sean Christopherson wrote:
> > Not really?  It's solving a problem that doesn't exist in the current code
> > base (assuming TSC_CTRL is fixed), and IMO solving it in an ugly fashion.
> > 
> > I would much prefer that, _if_ we want to support blind KVM-internal MSR
> > accesses, we end up with code like:
> > 
> > 	if (msr_info->kvm_internal) {
> > 		return 1;
> > 	} else if (!ignore_msrs) {
> > 		vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
> > 			    msr, data);
> > 		return 1;
> > 	} else {
> > 		if (report_ignored_msrs)
> > 			vcpu_unimpl(vcpu,
> > 				"ignored wrmsr: 0x%x data 0x%llx\n",
> > 				msr, data);
> > 		break;
> > 	}
> > 
> > But I'm still not convinced that there is a legimiate scenario for setting
> > kvm_internal=true.
> 
> Actually this really looks like my initial version when I was discussing this
> with Paolo before this version, but Paolo suggested what I implemented last.  I
> think I agree with Paolo that it's an improvement to have a way to get/set real
> msr value so that we don't need to further think about effects being taken with
> the two tricky msr knobs (report_ignored_msrs, ignore_msrs).  These knobs are
> even trickier to me when they're hidden deep, because they are not easily
> expected when seeing the name of the functions (e.g. __kvm_set_msr, rather than
> __kvm_set_msr_retval_fixed).

My argument is that it's a KVM bug if we ever encounter do the wrong thing
based on a KVM-internal MSR access.  The proposed change would actually make
it _harder_ to find the bug that prompted this patch, as the bogus
__kvm_get_msr() in kvm_cpuid() would silently fail.

If anything, I would argue for something like:

	if (WARN_ON_ONCE(msr_info->kvm_internal)) {
		return 1;
	} else if (!ignore_msrs) {
		...
	} else {
		...
	}

I.e. KVM-internal accesses should always pre-validate the existence of the
MSR, if not the validity of the MSR from the guest's perspective.
Peter Xu June 26, 2020, 6:07 p.m. UTC | #9
On Thu, Jun 25, 2020 at 09:25:40AM -0700, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5eb618dbf211..64322446e590 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>                 *ebx = entry->ebx;
>                 *ecx = entry->ecx;
>                 *edx = entry->edx;
> -               if (function == 7 && index == 0) {
> +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
>                         u64 data;
> -                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> +                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
>                             (data & TSX_CTRL_CPUID_CLEAR))
>                                 *ebx &= ~(F(RTM) | F(HLE));
>                 }
> 
> 
> On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
> regardless of whether or not it is being advertised to userspace (this is
> a bug in its own right).  Using the host_initiated variant means KVM will
> incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
> clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
> without advertising it to the guest.

Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
to have such a fix?
Sean Christopherson June 26, 2020, 6:18 p.m. UTC | #10
On Fri, Jun 26, 2020 at 02:07:32PM -0400, Peter Xu wrote:
> On Thu, Jun 25, 2020 at 09:25:40AM -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 5eb618dbf211..64322446e590 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> >                 *ebx = entry->ebx;
> >                 *ecx = entry->ecx;
> >                 *edx = entry->edx;
> > -               if (function == 7 && index == 0) {
> > +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
> >                         u64 data;
> > -                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> > +                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
> >                             (data & TSX_CTRL_CPUID_CLEAR))
> >                                 *ebx &= ~(F(RTM) | F(HLE));
> >                 }
> > 
> > 
> > On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
> > regardless of whether or not it is being advertised to userspace (this is
> > a bug in its own right).  Using the host_initiated variant means KVM will
> > incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
> > clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
> > without advertising it to the guest.
> 
> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> to have such a fix?

Not really, That ends up duplicating the check in vmx_get_msr().  From an
emulation perspective, this really is a "guest" access to the MSR, in the
sense that it the virtual CPU is in the guest domain, i.e. not a god-like
entity that gets to break the rules of emulation.

The other thing to consider is that SVM doesn't have any knowledge of any
of this, so arguably the "ignored msr" crud should get logged for SVM as
it's effectively a userspace bug if they've configured the VM to have RTM
or HLE on a system that can't possibly support either.
Peter Xu June 26, 2020, 7:11 p.m. UTC | #11
On Fri, Jun 26, 2020 at 11:18:20AM -0700, Sean Christopherson wrote:
> On Fri, Jun 26, 2020 at 02:07:32PM -0400, Peter Xu wrote:
> > On Thu, Jun 25, 2020 at 09:25:40AM -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 5eb618dbf211..64322446e590 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
> > >                 *ebx = entry->ebx;
> > >                 *ecx = entry->ecx;
> > >                 *edx = entry->edx;
> > > -               if (function == 7 && index == 0) {
> > > +               if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) {
> > >                         u64 data;
> > > -                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> > > +                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
> > >                             (data & TSX_CTRL_CPUID_CLEAR))
> > >                                 *ebx &= ~(F(RTM) | F(HLE));
> > >                 }
> > > 
> > > 
> > > On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array
> > > regardless of whether or not it is being advertised to userspace (this is
> > > a bug in its own right).  Using the host_initiated variant means KVM will
> > > incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly
> > > clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL
> > > without advertising it to the guest.
> > 
> > Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> > ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> > to have such a fix?
> 
> Not really, That ends up duplicating the check in vmx_get_msr().  From an
> emulation perspective, this really is a "guest" access to the MSR, in the
> sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> entity that gets to break the rules of emulation.

I can't say I agree that it's a guest behavior.  IMHO kvm plays the role as the
virtual processor.  If the bit in a cpuid entry depends on another MSR bit,
then the read of that MSR value is a "processor behavior", which in our case is
still a host behavior.  It's exactly because we thought it was a guest behavior
so we got confused when we saw the error message of "ignored rdmsr" the first
time but see the guest has no reason to do so...  So even if you want to keep
those error messages, I'd really appreciate if they can show something else so
we know it's not a guest rdmsr instruction.

To me, the existing tsx code is not a bug at all (IMHO the evil thing is the
tricky knobs and the fact that it hides deep, and that's why I really want to
move this series forward), and instead I think it's quite elegant to write
things like below...

  if (!__kvm_read_msr(&data) && (data & XXX))
    ...

It's definitely subjective so I can't argu much... However it's slightly
similar to rdmsr_safe and friends in that we don't need to remember two flags
(cap+msr) but only the msr (and I bet I'm not the only one who likes it, just
see the massive callers of all the "safe" versioned msr friends...).

Considering the fact that we still have the unexpected warning message on some
hosts with upgraded firmwares which potentially breaks some realtime systems,
do you think below simple and clear patch acceptable to you?

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..052c93997965 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1005,7 +1005,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
                *ebx = entry->ebx;
                *ecx = entry->ecx;
                *edx = entry->edx;
-               if (function == 7 && index == 0) {
+               if (function == 7 && index == 0 &&
+                   vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR) {
                        u64 data;
                        if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
                            (data & TSX_CTRL_CPUID_CLEAR))

Then we can further discuss whether and how we'd like to refactor the knobs and
around.

Thanks,
Paolo Bonzini June 27, 2020, 2:24 p.m. UTC | #12
On 26/06/20 20:18, Sean Christopherson wrote:
>> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
>> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
>> to have such a fix?
> Not really, That ends up duplicating the check in vmx_get_msr().  From an
> emulation perspective, this really is a "guest" access to the MSR, in the
> sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> entity that gets to break the rules of emulation.

But if you wrote a guest that wants to read MSR_IA32_TSX_CTRL, there are
two choices:

1) check ARCH_CAPABILITIES first

2) blindly access it and default to 0.

Both are fine, because we know MSR_IA32_TSX_CTRL has no
reserved/must-be-one bits.  Calling __kvm_get_msr and checking for an
invalid MSR through the return value is not breaking the rules of
emulation, it is "faking" a #GP handler.

So I think Peter's patch is fine, but (possibly on top as a third patch)
__must_check should be added to MSR getters and setters.  Also one
possibility is to return -EINVAL for invalid MSRs.

Paolo
Sean Christopherson June 30, 2020, 3:47 p.m. UTC | #13
On Sat, Jun 27, 2020 at 04:24:34PM +0200, Paolo Bonzini wrote:
> On 26/06/20 20:18, Sean Christopherson wrote:
> >> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> >> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> >> to have such a fix?
> > Not really, That ends up duplicating the check in vmx_get_msr().  From an
> > emulation perspective, this really is a "guest" access to the MSR, in the
> > sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> > entity that gets to break the rules of emulation.
> 
> But if you wrote a guest that wants to read MSR_IA32_TSX_CTRL, there are
> two choices:
> 
> 1) check ARCH_CAPABILITIES first
> 
> 2) blindly access it and default to 0.
> 
> Both are fine, because we know MSR_IA32_TSX_CTRL has no
> reserved/must-be-one bits.  Calling __kvm_get_msr and checking for an
> invalid MSR through the return value is not breaking the rules of
> emulation, it is "faking" a #GP handler.

"guest" was the wrong choice of word.  My point was that, IMO, emulation
should never set host_initiated=true.

To me, accessing MSRs with host_initiated is the equivalent of loading a
ucode patch, i.e. it's super duper special stuff that deliberately turns
off all safeguards and can change the fundamental behavior of the (virtual)
CPU.

Emulation on the other handle should either be subject to all the normal
rules or have dedicated, intelligent handling for breaking the normal rules,
e.g. nested usage of vmx_set_efer(), vmx_set_cr0, vmx_set_cr4, etc...

> So I think Peter's patch is fine, but (possibly on top as a third patch)
> __must_check should be added to MSR getters and setters.  Also one
> possibility is to return -EINVAL for invalid MSRs.
> 
> Paolo
>
Peter Xu July 9, 2020, 6:22 p.m. UTC | #14
On Tue, Jun 30, 2020 at 08:47:26AM -0700, Sean Christopherson wrote:
> On Sat, Jun 27, 2020 at 04:24:34PM +0200, Paolo Bonzini wrote:
> > On 26/06/20 20:18, Sean Christopherson wrote:
> > >> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> > >> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> > >> to have such a fix?
> > > Not really, That ends up duplicating the check in vmx_get_msr().  From an
> > > emulation perspective, this really is a "guest" access to the MSR, in the
> > > sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> > > entity that gets to break the rules of emulation.
> > 
> > But if you wrote a guest that wants to read MSR_IA32_TSX_CTRL, there are
> > two choices:
> > 
> > 1) check ARCH_CAPABILITIES first
> > 
> > 2) blindly access it and default to 0.
> > 
> > Both are fine, because we know MSR_IA32_TSX_CTRL has no
> > reserved/must-be-one bits.  Calling __kvm_get_msr and checking for an
> > invalid MSR through the return value is not breaking the rules of
> > emulation, it is "faking" a #GP handler.
> 
> "guest" was the wrong choice of word.  My point was that, IMO, emulation
> should never set host_initiated=true.
> 
> To me, accessing MSRs with host_initiated is the equivalent of loading a
> ucode patch, i.e. it's super duper special stuff that deliberately turns
> off all safeguards and can change the fundamental behavior of the (virtual)
> CPU.

This seems to be an orthogonal change against what this series tried to do.  We
use host_initiated=true in current code, and this series won't change that fact
either.  As I mentioned in the other thread, at least the rdmsr warning is
ambiguous when it's not initiated from the guest if without this patchset, and
this series could address that.

> 
> > So I think Peter's patch is fine, but (possibly on top as a third patch)
> > __must_check should be added to MSR getters and setters.  Also one
> > possibility is to return -EINVAL for invalid MSRs.

Yeah I can add another patch for that.  Also if to repost, I tend to also
introduce KVM_MSR_RET_[OK|ERROR] too, which seems to be cleaner when we had
KVM_MSR_RET_INVALID.

Any objections before I repost?
Paolo Bonzini July 9, 2020, 6:24 p.m. UTC | #15
On 09/07/20 20:22, Peter Xu wrote:
>>> So I think Peter's patch is fine, but (possibly on top as a third patch)
>>> __must_check should be added to MSR getters and setters.  Also one
>>> possibility is to return -EINVAL for invalid MSRs.
> Yeah I can add another patch for that.  Also if to repost, I tend to also
> introduce KVM_MSR_RET_[OK|ERROR] too, which seems to be cleaner when we had
> KVM_MSR_RET_INVALID.

You can use either that or 0/1/-EINVAL.  I would be fine with that to
keep the patch small.

Paolo
Peter Xu July 9, 2020, 6:34 p.m. UTC | #16
On Thu, Jul 09, 2020 at 08:24:03PM +0200, Paolo Bonzini wrote:
> On 09/07/20 20:22, Peter Xu wrote:
> >>> So I think Peter's patch is fine, but (possibly on top as a third patch)
> >>> __must_check should be added to MSR getters and setters.  Also one
> >>> possibility is to return -EINVAL for invalid MSRs.
> > Yeah I can add another patch for that.  Also if to repost, I tend to also
> > introduce KVM_MSR_RET_[OK|ERROR] too, which seems to be cleaner when we had
> > KVM_MSR_RET_INVALID.
> 
> You can use either that or 0/1/-EINVAL.  I would be fine with that to
> keep the patch small.

Yeah -EINVAL looks good.  Then, do you want me to s/1/-EFAULT/ altogether?
Sean Christopherson July 9, 2020, 7:24 p.m. UTC | #17
On Thu, Jul 09, 2020 at 02:22:20PM -0400, Peter Xu wrote:
> On Tue, Jun 30, 2020 at 08:47:26AM -0700, Sean Christopherson wrote:
> > On Sat, Jun 27, 2020 at 04:24:34PM +0200, Paolo Bonzini wrote:
> > > On 26/06/20 20:18, Sean Christopherson wrote:
> > > >> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> > > >> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> > > >> to have such a fix?
> > > > Not really, That ends up duplicating the check in vmx_get_msr().  From an
> > > > emulation perspective, this really is a "guest" access to the MSR, in the
> > > > sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> > > > entity that gets to break the rules of emulation.
> > > 
> > > But if you wrote a guest that wants to read MSR_IA32_TSX_CTRL, there are
> > > two choices:
> > > 
> > > 1) check ARCH_CAPABILITIES first
> > > 
> > > 2) blindly access it and default to 0.
> > > 
> > > Both are fine, because we know MSR_IA32_TSX_CTRL has no
> > > reserved/must-be-one bits.  Calling __kvm_get_msr and checking for an
> > > invalid MSR through the return value is not breaking the rules of
> > > emulation, it is "faking" a #GP handler.
> > 
> > "guest" was the wrong choice of word.  My point was that, IMO, emulation
> > should never set host_initiated=true.
> > 
> > To me, accessing MSRs with host_initiated is the equivalent of loading a
> > ucode patch, i.e. it's super duper special stuff that deliberately turns
> > off all safeguards and can change the fundamental behavior of the (virtual)
> > CPU.
> 
> This seems to be an orthogonal change against what this series tried to do.  We
> use host_initiated=true in current code, and this series won't change that fact
> either.  As I mentioned in the other thread, at least the rdmsr warning is
> ambiguous when it's not initiated from the guest if without this patchset, and
> this series could address that.

My argument is that using host_initiated=true is wrong.  

> > > So I think Peter's patch is fine, but (possibly on top as a third patch)
> > > __must_check should be added to MSR getters and setters.  Also one
> > > possibility is to return -EINVAL for invalid MSRs.
> 
> Yeah I can add another patch for that.  Also if to repost, I tend to also
> introduce KVM_MSR_RET_[OK|ERROR] too, which seems to be cleaner when we had
> KVM_MSR_RET_INVALID.
> 
> Any objections before I repost?

Heh, or perhaps "Any objections that haven't been overruled before I repost?" :-D
Peter Xu July 9, 2020, 9:09 p.m. UTC | #18
On Thu, Jul 09, 2020 at 12:24:40PM -0700, Sean Christopherson wrote:
> On Thu, Jul 09, 2020 at 02:22:20PM -0400, Peter Xu wrote:
> > On Tue, Jun 30, 2020 at 08:47:26AM -0700, Sean Christopherson wrote:
> > > On Sat, Jun 27, 2020 at 04:24:34PM +0200, Paolo Bonzini wrote:
> > > > On 26/06/20 20:18, Sean Christopherson wrote:
> > > > >> Btw, would it be more staightforward to check "vcpu->arch.arch_capabilities &
> > > > >> ARCH_CAP_TSX_CTRL_MSR" rather than "*ebx | (F(RTM) | F(HLE))" even if we want
> > > > >> to have such a fix?
> > > > > Not really, That ends up duplicating the check in vmx_get_msr().  From an
> > > > > emulation perspective, this really is a "guest" access to the MSR, in the
> > > > > sense that it the virtual CPU is in the guest domain, i.e. not a god-like
> > > > > entity that gets to break the rules of emulation.
> > > > 
> > > > But if you wrote a guest that wants to read MSR_IA32_TSX_CTRL, there are
> > > > two choices:
> > > > 
> > > > 1) check ARCH_CAPABILITIES first
> > > > 
> > > > 2) blindly access it and default to 0.
> > > > 
> > > > Both are fine, because we know MSR_IA32_TSX_CTRL has no
> > > > reserved/must-be-one bits.  Calling __kvm_get_msr and checking for an
> > > > invalid MSR through the return value is not breaking the rules of
> > > > emulation, it is "faking" a #GP handler.
> > > 
> > > "guest" was the wrong choice of word.  My point was that, IMO, emulation
> > > should never set host_initiated=true.
> > > 
> > > To me, accessing MSRs with host_initiated is the equivalent of loading a
> > > ucode patch, i.e. it's super duper special stuff that deliberately turns
> > > off all safeguards and can change the fundamental behavior of the (virtual)
> > > CPU.
> > 
> > This seems to be an orthogonal change against what this series tried to do.  We
> > use host_initiated=true in current code, and this series won't change that fact
> > either.  As I mentioned in the other thread, at least the rdmsr warning is
> > ambiguous when it's not initiated from the guest if without this patchset, and
> > this series could address that.
> 
> My argument is that using host_initiated=true is wrong.  
> 
> > > > So I think Peter's patch is fine, but (possibly on top as a third patch)
> > > > __must_check should be added to MSR getters and setters.  Also one
> > > > possibility is to return -EINVAL for invalid MSRs.
> > 
> > Yeah I can add another patch for that.  Also if to repost, I tend to also
> > introduce KVM_MSR_RET_[OK|ERROR] too, which seems to be cleaner when we had
> > KVM_MSR_RET_INVALID.
> > 
> > Any objections before I repost?
> 
> Heh, or perhaps "Any objections that haven't been overruled before I repost?" :-D

Again, using host_initiated or not should be a different issue?  Frankly
speaking, I don't know whether it's an issue or not, but it's different from
what this series wants to do, because it'll be the same before/after this
series. Am I right?

Or, please explain what's the "overruled objection" that you're talking about..
Sean Christopherson July 9, 2020, 9:26 p.m. UTC | #19
On Thu, Jul 09, 2020 at 05:09:19PM -0400, Peter Xu wrote:
> Again, using host_initiated or not should be a different issue?  Frankly
> speaking, I don't know whether it's an issue or not, but it's different from
> what this series wants to do, because it'll be the same before/after this
> series. Am I right?

I'm arguing that the TSX thing should be

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5eb618dbf211..e1fd5ac0df96 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1015,7 +1015,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
                *edx = entry->edx;
                if (function == 7 && index == 0) {
                        u64 data;
-                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
+                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
                            (data & TSX_CTRL_CPUID_CLEAR))
                                *ebx &= ~(F(RTM) | F(HLE));
                }

At which point hoisting the ignored message up a few levels is pointless
because the only users of __kvm_*et_msr() will do the explicit ignored_check.
And I'm also arguing that KVM should never use __kvm_get_msr() for its own
actions, as host_initiated=true should only be used for host VMM accesses and
host_initiated=false actions should go through the proper checks and never
get to the ignored_msrs logic (assuming no KVM bug).

> Or, please explain what's the "overruled objection" that you're talking about..

Sean: Objection your honor.
Paolo: Overruled, you're wrong.
Sean: Phooey.

My point is that even though I still object to this series, Paolo has final
say.
Peter Xu July 9, 2020, 9:50 p.m. UTC | #20
On Thu, Jul 09, 2020 at 02:26:52PM -0700, Sean Christopherson wrote:
> On Thu, Jul 09, 2020 at 05:09:19PM -0400, Peter Xu wrote:
> > Again, using host_initiated or not should be a different issue?  Frankly
> > speaking, I don't know whether it's an issue or not, but it's different from
> > what this series wants to do, because it'll be the same before/after this
> > series. Am I right?
> 
> I'm arguing that the TSX thing should be
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5eb618dbf211..e1fd5ac0df96 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1015,7 +1015,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
>                 *edx = entry->edx;
>                 if (function == 7 && index == 0) {
>                         u64 data;
> -                       if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) &&
> +                       if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) &&
>                             (data & TSX_CTRL_CPUID_CLEAR))
>                                 *ebx &= ~(F(RTM) | F(HLE));
>                 }
> 
> At which point hoisting the ignored message up a few levels is pointless
> because the only users of __kvm_*et_msr() will do the explicit ignored_check.
> And I'm also arguing that KVM should never use __kvm_get_msr() for its own
> actions, as host_initiated=true should only be used for host VMM accesses and
> host_initiated=false actions should go through the proper checks and never
> get to the ignored_msrs logic (assuming no KVM bug).
> 
> > Or, please explain what's the "overruled objection" that you're talking about..
> 
> Sean: Objection your honor.
> Paolo: Overruled, you're wrong.
> Sean: Phooey.
> 
> My point is that even though I still object to this series, Paolo has final
> say.

I could be wrong, but I feel like Paolo was really respecting your input, as
always. It's just as simple as a 2:1 vote, isn't it? (I can still count myself
in for the vote, right? :)

Btw, you didn't reply to my other email:

  https://lore.kernel.org/kvm/20200626191118.GC175520@xz-x1/

Let me change the question a bit - Do you think e.g. we should never use
rdmsr*_safe() in the Linux kernel as long as the MSR has a bit somewhere
telling whether the MSR exists (so we should never trigger #GP on these MSRs)?
I think it's a similar question that we're discussing here..
Paolo Bonzini July 9, 2020, 10:11 p.m. UTC | #21
On 09/07/20 23:50, Peter Xu wrote:
>> Sean: Objection your honor.
>> Paolo: Overruled, you're wrong.
>> Sean: Phooey.
>>
>> My point is that even though I still object to this series, Paolo has final
>> say.
>
> I could be wrong, but I feel like Paolo was really respecting your input, as
> always.

I do respect Sean's input, but I also believe that in this case there's
three questions:

a) should KVM be allowed to use the equivalent of rdmsr*_safe() on guest
MSRs?  I say a mild yes, Sean says a strong no.

b) is it good to separate the "1" and "-EINVAL" results so that
ignore_msrs handling can be moved out of the MSR access functions?  I
say yes because KVM should never rely on ignore_msrs; Sean didn't say
anything (it's not too relevant if you answer no to the first question).

c) is it possible to reimplement TSX_CTRL_MSR to avoid using the
equivalent of rdmsr*_safe()?  Sean says yes and it's not really possible
to argue against that, but then it doesn't really matter if you answer
yes to the first two questions.

Sean sees your patch mostly as answering "yes" to the question (a), and
therefore disagrees with it.  I see your patch mostly as answering "yes"
to question (b), and therefore like it.  I would also accept a patch
that reimplements TSX_CTRL_MSR (question c), but I consider your patch
to be an improvement anyway (question b).

> It's just as simple as a 2:1 vote, isn't it? (I can still count myself
> in for the vote, right? :)

I do have the final say but I try to use that as little as possible (or
never).  And then it happens that ever so rare disagreements cluster in
the same week!

The important thing is to analyze the source of the disagreement.
Usually when that happens, it's because a change has multiple purposes
and people see it in a different way.

In this case, I'm happy to accept this patch (and overrule Sean) not
because he's wrong on question (a), but because in my opinion the actual
motivation of the patch is question (b).

To be fair, I would prefer it if ignore_msrs didn't apply to
host-initiated MSR accesses at all (only guest accesses).  That would
make this series much much simpler.  It wouldn't solve the disagremement
on question (a), but perhaps it would be a patch that Sean would agree on.

Thanks,

Paolo

> Btw, you didn't reply to my other email:
> 
>   https://lore.kernel.org/kvm/20200626191118.GC175520@xz-x1/
> 
> Let me change the question a bit - Do you think e.g. we should never use
> rdmsr*_safe() in the Linux kernel as long as the MSR has a bit somewhere
> telling whether the MSR exists (so we should never trigger #GP on these MSRs)?
> I think it's a similar question that we're discussing here..
Sean Christopherson July 10, 2020, 4:58 a.m. UTC | #22
On Fri, Jul 10, 2020 at 12:11:54AM +0200, Paolo Bonzini wrote:
> On 09/07/20 23:50, Peter Xu wrote:
> >> Sean: Objection your honor.
> >> Paolo: Overruled, you're wrong.
> >> Sean: Phooey.
> >>
> >> My point is that even though I still object to this series, Paolo has final
> >> say.
> >
> > I could be wrong, but I feel like Paolo was really respecting your input, as
> > always.
> 
> I do respect Sean's input

Ya, my comments were in jest.  Sorry if I implied I was grumpy about Paolo
taking this patch, because I'm not.  Just stubborn :-)

> but I also believe that in this case there's three questions:
> 
> a) should KVM be allowed to use the equivalent of rdmsr*_safe() on guest
> MSRs?  I say a mild yes, Sean says a strong no.

It's more that I don't think host_initiated=true is the equivalent of
rdmsr_safe().  It kind of holds true for rdmsr, but that's most definitely
not the case for wrmsr where host_initiated=true completely changes what
is/isn't allow.  And if using host_initiated=true for rdmsr is allowed,
then logically using it for wrmsr is also allowed.

> b) is it good to separate the "1" and "-EINVAL" results so that
> ignore_msrs handling can be moved out of the MSR access functions?  I
> say yes because KVM should never rely on ignore_msrs; Sean didn't say
> anything (it's not too relevant if you answer no to the first question).
> 
> c) is it possible to reimplement TSX_CTRL_MSR to avoid using the
> equivalent of rdmsr*_safe()?  Sean says yes and it's not really possible
> to argue against that, but then it doesn't really matter if you answer
> yes to the first two questions.
> 
> Sean sees your patch mostly as answering "yes" to the question (a), and
> therefore disagrees with it.  I see your patch mostly as answering "yes"
> to question (b), and therefore like it.  I would also accept a patch
> that reimplements TSX_CTRL_MSR (question c), but I consider your patch
> to be an improvement anyway (question b).
> 
> > It's just as simple as a 2:1 vote, isn't it? (I can still count myself
> > in for the vote, right? :)
> 
> I do have the final say but I try to use that as little as possible (or
> never).  And then it happens that ever so rare disagreements cluster in
> the same week!
> 
> The important thing is to analyze the source of the disagreement.
> Usually when that happens, it's because a change has multiple purposes
> and people see it in a different way.
> 
> In this case, I'm happy to accept this patch (and overrule Sean) not
> because he's wrong on question (a), but because in my opinion the actual
> motivation of the patch is question (b).
> 
> To be fair, I would prefer it if ignore_msrs didn't apply to
> host-initiated MSR accesses at all (only guest accesses).  That would
> make this series much much simpler.  It wouldn't solve the disagremement
> on question (a), but perhaps it would be a patch that Sean would agree on.

I think I could get behind that.  It shoudn't interfere with my crusade to
vanquish host_initiated :-)
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c17e6eb9ad43..b49eaf8a2ce5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -245,6 +245,29 @@  EXPORT_SYMBOL_GPL(x86_fpu_cache);
 
 static struct kmem_cache *x86_emulator_cache;
 
+/*
+ * When called, it means the previous get/set msr reached an invalid msr.
+ * Return 0 if we want to ignore/silent this failed msr access, or 1 if we want
+ * to fail the caller.
+ */
+static int kvm_msr_ignored_check(struct kvm_vcpu *vcpu, u32 msr,
+				 u64 data, bool write)
+{
+	const char *op = write ? "wrmsr" : "rdmsr";
+
+	if (ignore_msrs) {
+		if (report_ignored_msrs)
+			vcpu_unimpl(vcpu, "ignored %s: 0x%x data 0x%llx\n",
+				    op, msr, data);
+		/* Mask the error */
+		return 0;
+	} else {
+		vcpu_debug_ratelimited(vcpu, "unhandled %s: 0x%x data 0x%llx\n",
+				       op, msr, data);
+		return 1;
+	}
+}
+
 static struct kmem_cache *kvm_alloc_emulator_cache(void)
 {
 	unsigned int useroffset = offsetof(struct x86_emulate_ctxt, src);
@@ -1496,6 +1519,17 @@  static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 	return kvm_x86_ops.set_msr(vcpu, &msr);
 }
 
+static int kvm_set_msr_ignored_check(struct kvm_vcpu *vcpu,
+				     u32 index, u64 data, bool host_initiated)
+{
+	int ret = __kvm_set_msr(vcpu, index, data, host_initiated);
+
+	if (ret == KVM_MSR_RET_INVALID)
+		ret = kvm_msr_ignored_check(vcpu, index, data, true);
+
+	return ret;
+}
+
 /*
  * Read the MSR specified by @index into @data.  Select MSR specific fault
  * checks are bypassed if @host_initiated is %true.
@@ -1517,15 +1551,29 @@  int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 	return ret;
 }
 
+static int kvm_get_msr_ignored_check(struct kvm_vcpu *vcpu,
+				     u32 index, u64 *data, bool host_initiated)
+{
+	int ret = __kvm_get_msr(vcpu, index, data, host_initiated);
+
+	if (ret == KVM_MSR_RET_INVALID) {
+		/* Unconditionally clear *data for simplicity */
+		*data = 0;
+		ret = kvm_msr_ignored_check(vcpu, index, 0, false);
+	}
+
+	return ret;
+}
+
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
 {
-	return __kvm_get_msr(vcpu, index, data, false);
+	return kvm_get_msr_ignored_check(vcpu, index, data, false);
 }
 EXPORT_SYMBOL_GPL(kvm_get_msr);
 
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 {
-	return __kvm_set_msr(vcpu, index, data, false);
+	return kvm_set_msr_ignored_check(vcpu, index, data, false);
 }
 EXPORT_SYMBOL_GPL(kvm_set_msr);
 
@@ -1621,12 +1669,12 @@  EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff);
  */
 static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 {
-	return __kvm_get_msr(vcpu, index, data, true);
+	return kvm_get_msr_ignored_check(vcpu, index, data, true);
 }
 
 static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 {
-	return __kvm_set_msr(vcpu, index, *data, true);
+	return kvm_set_msr_ignored_check(vcpu, index, *data, true);
 }
 
 #ifdef CONFIG_X86_64
@@ -2977,17 +3025,7 @@  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return xen_hvm_config(vcpu, data);
 		if (kvm_pmu_is_valid_msr(vcpu, msr))
 			return kvm_pmu_set_msr(vcpu, msr_info);
-		if (!ignore_msrs) {
-			vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
-				    msr, data);
-			return 1;
-		} else {
-			if (report_ignored_msrs)
-				vcpu_unimpl(vcpu,
-					"ignored wrmsr: 0x%x data 0x%llx\n",
-					msr, data);
-			break;
-		}
+		return KVM_MSR_RET_INVALID;
 	}
 	return 0;
 }
@@ -3234,17 +3272,7 @@  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	default:
 		if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
 			return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
-		if (!ignore_msrs) {
-			vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
-					       msr_info->index);
-			return 1;
-		} else {
-			if (report_ignored_msrs)
-				vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n",
-					msr_info->index);
-			msr_info->data = 0;
-		}
-		break;
+		return KVM_MSR_RET_INVALID;
 	}
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index b968acc0516f..62e3e896d059 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -359,4 +359,6 @@  void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 u64 kvm_spec_ctrl_valid_bits(struct kvm_vcpu *vcpu);
 
+#define  KVM_MSR_RET_INVALID  2
+
 #endif