kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints

Message ID	20191122234355.174998-1-jmattson@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=GKhs=ZO=vger.kernel.org=kvm-owner@kernel.org> Date: Fri, 22 Nov 2019 15:43:55 -0800 Message-Id: <20191122234355.174998-1-jmattson@google.com> Mime-Version: 1.0 Subject: [PATCH] kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <sean.j.christopherson@intel.com> Cc: Haozhong Zhang <haozhong.zhang@intel.com>, Jim Mattson <jmattson@google.com> Content-Type: text/plain; charset="UTF-8" Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints \| expand kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints

Jim Mattson Nov. 22, 2019, 11:43 p.m. UTC

Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest
MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
new constraints on the data values that kvm would accept for the guest
MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
opt-in via a new KVM capability, but they were applied
indiscriminately, breaking at least one existing hypervisor.

Relax the constraints to allow either or both of
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
enabled. This change is sufficient to fix the aforementioned breakage.

Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Liran Alon Nov. 22, 2019, 11:57 p.m. UTC | #1

> On 23 Nov 2019, at 1:43, Jim Mattson <jmattson@google.com> wrote:
> 
> Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest
> MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
> new constraints on the data values that kvm would accept for the guest
> MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
> opt-in via a new KVM capability, but they were applied
> indiscriminately, breaking at least one existing hypervisor.
> 
> Relax the constraints to allow either or both of
> FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
> enabled. This change is sufficient to fix the aforementioned breakage.
> 
> Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
> Signed-off-by: Jim Mattson <jmattson@google.com>

I suggest to also add a comment in code to clarify why we allow setting
FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a vCPU that doesn’t support Intel TXT.
(I think the compatibility to existing workloads that sets this blindly on boot is a legit reason. Just recommend documenting it.)

In addition, if the nested hypervisor which relies on this is public, please also mention it in commit message for reference.

Reviewed-by: Liran Alon <liran.alon@oracle.com>

> ---
> arch/x86/kvm/vmx/vmx.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 04a8212704c17..9f46023451810 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7097,10 +7097,12 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> 
> 	if (nested_vmx_allowed(vcpu))
> 		to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
> +			FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX |
> 			FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
> 	else
> 		to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
> -			~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
> +			~(FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX |
> +			  FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
> 
> 	if (nested_vmx_allowed(vcpu)) {
> 		nested_vmx_cr_fixed1_bits_update(vcpu);
> -- 
> 2.24.0.432.g9d3f5f5b63-goog
>

Jim Mattson Nov. 23, 2019, 12:22 a.m. UTC | #2

On Fri, Nov 22, 2019 at 3:57 PM Liran Alon <liran.alon@oracle.com> wrote:
>
>
>
> > On 23 Nov 2019, at 1:43, Jim Mattson <jmattson@google.com> wrote:
> >
> > Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest
> > MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
> > new constraints on the data values that kvm would accept for the guest
> > MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
> > opt-in via a new KVM capability, but they were applied
> > indiscriminately, breaking at least one existing hypervisor.
> >
> > Relax the constraints to allow either or both of
> > FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
> > FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
> > enabled. This change is sufficient to fix the aforementioned breakage.
> >
> > Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
>
> I suggest to also add a comment in code to clarify why we allow setting
> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a vCPU that doesn’t support Intel TXT.
> (I think the compatibility to existing workloads that sets this blindly on boot is a legit reason. Just recommend documenting it.)
>
> In addition, if the nested hypervisor which relies on this is public, please also mention it in commit message for reference.

It's not an L1 hypervisor that's the problem. It's Google's L0
hypervisor. We've been incorrectly reporting IA32_FEATURE_CONTROL as 7
to nested guests for years, and now we have thousands of running VMs
with the bogus value. I've thought about just changing it to 5 on the
fly (on real hardware, one could almost blame it on SMM, but the MSR
is *locked*, after all).

> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 04a8212704c17..9f46023451810 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7097,10 +7097,12 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
> >
> >       if (nested_vmx_allowed(vcpu))
> >               to_vmx(vcpu)->msr_ia32_feature_control_valid_bits |=
> > +                     FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX |
> >                       FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
> >       else
> >               to_vmx(vcpu)->msr_ia32_feature_control_valid_bits &=
> > -                     ~FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX;
> > +                     ~(FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX |
> > +                       FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX);
> >
> >       if (nested_vmx_allowed(vcpu)) {
> >               nested_vmx_cr_fixed1_bits_update(vcpu);
> > --
> > 2.24.0.432.g9d3f5f5b63-goog
> >
>

Liran Alon Nov. 23, 2019, 1:49 a.m. UTC | #3

> On 23 Nov 2019, at 2:22, Jim Mattson <jmattson@google.com> wrote:
> 
> On Fri, Nov 22, 2019 at 3:57 PM Liran Alon <liran.alon@oracle.com> wrote:
>> 
>> 
>>> On 23 Nov 2019, at 1:43, Jim Mattson <jmattson@google.com> wrote:
>>> 
>>> Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest
>>> MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
>>> new constraints on the data values that kvm would accept for the guest
>>> MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
>>> opt-in via a new KVM capability, but they were applied
>>> indiscriminately, breaking at least one existing hypervisor.
>>> 
>>> Relax the constraints to allow either or both of
>>> FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
>>> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
>>> enabled. This change is sufficient to fix the aforementioned breakage.
>>> 
>>> Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
>>> Signed-off-by: Jim Mattson <jmattson@google.com>
>> 
>> I suggest to also add a comment in code to clarify why we allow setting
>> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a vCPU that doesn’t support Intel TXT.
>> (I think the compatibility to existing workloads that sets this blindly on boot is a legit reason. Just recommend documenting it.)
>> 
>> In addition, if the nested hypervisor which relies on this is public, please also mention it in commit message for reference.
> 
> It's not an L1 hypervisor that's the problem. It's Google's L0
> hypervisor. We've been incorrectly reporting IA32_FEATURE_CONTROL as 7
> to nested guests for years, and now we have thousands of running VMs
> with the bogus value. I've thought about just changing it to 5 on the
> fly (on real hardware, one could almost blame it on SMM, but the MSR
> is *locked*, after all).

If I understand correctly, you are talking about the case a VM is already running and you will
perform Live-Migration on it to a new host with a new kernel that it’s KVM don’t allow to
set FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX.

If we haven’t encountered yet some guest workload that blindly sets this bit in IA32_FEATURE_CONTROL,
then it should be sufficient for you to modify vmx_set_msr() to allow setting this bit in case msr_info->host_initiated
(i.e. During restore of VM state on destination) but disallow it when WRMSR is initiated from guest.
The behaviour of whether vmx_set_msr() allows host to set this MSR to unsupported can even be gated by an opt-in KVM cap
that will be set by Google’s userspace VMM.

That way, you won’t change change guest runtime semantics (it’s original locked MSR value will be preserved during Live-Migration),
and you will disallow newly provisioned guests from setting IA32_FEATURE_CONTROL to an unsupported value.

What do you think?

-Liran

Paolo Bonzini Nov. 25, 2019, 12:37 p.m. UTC | #4

On 23/11/19 01:22, Jim Mattson wrote:
>> I suggest to also add a comment in code to clarify why we allow setting
>> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a
>> vCPU that doesn’t support Intel TXT.
>> (I think the compatibility to existing workloads that sets this
>> blindly on boot is a legit reason. Just recommend documenting it.)
>>
>> In addition, if the nested hypervisor which relies on this is
>> public, please also mention it in commit message for reference.
>
> It's not an L1 hypervisor that's the problem. It's Google's L0
> hypervisor. We've been incorrectly reporting IA32_FEATURE_CONTROL as 7
> to nested guests for years, and now we have thousands of running VMs
> with the bogus value. I've thought about just changing it to 5 on the
> fly (on real hardware, one could almost blame it on SMM, but the MSR
> is *locked*, after all).

Queued, thanks.

Paolo

Jim Mattson Nov. 25, 2019, 5:45 p.m. UTC | #5

On Fri, Nov 22, 2019 at 5:49 PM Liran Alon <liran.alon@oracle.com> wrote:
>
>
>
> > On 23 Nov 2019, at 2:22, Jim Mattson <jmattson@google.com> wrote:
> >
> > On Fri, Nov 22, 2019 at 3:57 PM Liran Alon <liran.alon@oracle.com> wrote:
> >>
> >>
> >>> On 23 Nov 2019, at 1:43, Jim Mattson <jmattson@google.com> wrote:
> >>>
> >>> Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest
> >>> MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting
> >>> new constraints on the data values that kvm would accept for the guest
> >>> MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been
> >>> opt-in via a new KVM capability, but they were applied
> >>> indiscriminately, breaking at least one existing hypervisor.
> >>>
> >>> Relax the constraints to allow either or both of
> >>> FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and
> >>> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is
> >>> enabled. This change is sufficient to fix the aforementioned breakage.
> >>>
> >>> Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL")
> >>> Signed-off-by: Jim Mattson <jmattson@google.com>
> >>
> >> I suggest to also add a comment in code to clarify why we allow setting
> >> FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX even though we expose a vCPU that doesn’t support Intel TXT.
> >> (I think the compatibility to existing workloads that sets this blindly on boot is a legit reason. Just recommend documenting it.)
> >>
> >> In addition, if the nested hypervisor which relies on this is public, please also mention it in commit message for reference.
> >
> > It's not an L1 hypervisor that's the problem. It's Google's L0
> > hypervisor. We've been incorrectly reporting IA32_FEATURE_CONTROL as 7
> > to nested guests for years, and now we have thousands of running VMs
> > with the bogus value. I've thought about just changing it to 5 on the
> > fly (on real hardware, one could almost blame it on SMM, but the MSR
> > is *locked*, after all).
>
> If I understand correctly, you are talking about the case a VM is already running and you will
> perform Live-Migration on it to a new host with a new kernel that it’s KVM don’t allow to
> set FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX.
>
> If we haven’t encountered yet some guest workload that blindly sets this bit in IA32_FEATURE_CONTROL,
> then it should be sufficient for you to modify vmx_set_msr() to allow setting this bit in case msr_info->host_initiated
> (i.e. During restore of VM state on destination) but disallow it when WRMSR is initiated from guest.
> The behaviour of whether vmx_set_msr() allows host to set this MSR to unsupported can even be gated by an opt-in KVM cap
> that will be set by Google’s userspace VMM.

I would call that an opt-out cap, rather than an opt-in cap. That is,
we'd like to opt-out from the ABI change. I was going to go ahead and
do it, but it looks like Paolo has accepted the change as it is.

Thanks, Paolo!

> That way, you won’t change change guest runtime semantics (it’s original locked MSR value will be preserved during Live-Migration),
> and you will disallow newly provisioned guests from setting IA32_FEATURE_CONTROL to an unsupported value.
>
> What do you think?
>
> -Liran
>
>
>
>
>

Paolo Bonzini Nov. 25, 2019, 6:14 p.m. UTC | #6

On 25/11/19 18:45, Jim Mattson wrote:
> I would call that an opt-out cap, rather than an opt-in cap. That is,
> we'd like to opt-out from the ABI change. I was going to go ahead and
> do it, but it looks like Paolo has accepted the change as it is.

Yes, your experience has proved that guests are not annoyed by the
discrepancy between VMX_INSIDE_SMX and the actual lack of SMX, so it's
pointless to complicate the code.

Paolo

Sean Christopherson Nov. 25, 2019, 6:35 p.m. UTC | #7

On Mon, Nov 25, 2019 at 07:14:15PM +0100, Paolo Bonzini wrote:
> On 25/11/19 18:45, Jim Mattson wrote:
> > I would call that an opt-out cap, rather than an opt-in cap. That is,
> > we'd like to opt-out from the ABI change. I was going to go ahead and
> > do it, but it looks like Paolo has accepted the change as it is.
> 
> Yes, your experience has proved that guests are not annoyed by the
> discrepancy between VMX_INSIDE_SMX and the actual lack of SMX, so it's
> pointless to complicate the code.

Color me surprised that no guest cares about SMX :-)

kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints

Commit Message

Comments

Patch