nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area
diff mbox series

Message ID 20190930233626.22852-2-krish.sadhukhan@oracle.com
State New
Headers show
Series
  • nVMX: Defer error from VM-entry MSR-load area to until after hardware verifies VMCS guest state-area
Related show

Commit Message

Krish Sadhukhan Sept. 30, 2019, 11:36 p.m. UTC
According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
performed in a certain order. Checks on MSRs that are loaded on VM-entry
from VM-entry MSR-load area, should be done after verifying VMCS controls,
host-state area and guest-state area. As KVM relies on CPU hardware to
perform some of these checks, we need to defer VM-exit due to invalid
VM-entry MSR-load area to until after CPU hardware completes the earlier
checks and is ready to do VMLAUNCH/VMRESUME.

In order to defer errors arising from invalid VM-entry MSR-load area in
vmcs12, we set up a single invalid entry, which is illegal according to
section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
due to MSR loading" after it completes checks on VMCS controls, host-state
area and guest-state area. We reflect a synthesized Exit Qualification to
our guest.

Suggested-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
---
 arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
 arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
 arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
 3 files changed, 49 insertions(+), 5 deletions(-)

Comments

Jim Mattson Oct. 1, 2019, 5:09 p.m. UTC | #1
On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
> performed in a certain order. Checks on MSRs that are loaded on VM-entry
> from VM-entry MSR-load area, should be done after verifying VMCS controls,
> host-state area and guest-state area. As KVM relies on CPU hardware to
> perform some of these checks, we need to defer VM-exit due to invalid
> VM-entry MSR-load area to until after CPU hardware completes the earlier
> checks and is ready to do VMLAUNCH/VMRESUME.
>
> In order to defer errors arising from invalid VM-entry MSR-load area in
> vmcs12, we set up a single invalid entry, which is illegal according to
> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
> due to MSR loading" after it completes checks on VMCS controls, host-state
> area and guest-state area. We reflect a synthesized Exit Qualification to
> our guest.

This change addresses the priority inversion, but it still potentially
leaves guest MSRs incorrectly loaded with values from the VMCS12
VM-entry MSR-load area when a higher priority error condition would
have precluded any processing of the VM-entry MSR-load area.

> Suggested-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
>  arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
>  arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
>  3 files changed, 49 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index ced9fba32598..b74491c04090 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>                 goto vmentry_fail_vmexit_guest_mode;
>
>         if (from_vmentry) {
> -               exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
>                 exit_qual = nested_vmx_load_msr(vcpu,
>                                                 vmcs12->vm_entry_msr_load_addr,
>                                                 vmcs12->vm_entry_msr_load_count);
> -               if (exit_qual)
> -                       goto vmentry_fail_vmexit_guest_mode;
> +               if (exit_qual) {
> +                       /*
> +                        * According to section “VM Entries” in Intel SDM
> +                        * vol 3C, VM-entry checks are performed in a certain
> +                        * order. Checks on MSRs that are loaded on VM-entry
> +                        * from VM-entry MSR-load area, should be done after
> +                        * verifying VMCS controls, host-state area and
> +                        * guest-state area. As KVM relies on CPU hardware to
> +                        * perform some of these checks, we need to defer
> +                        * VM-exit due to invalid VM-entry MSR-load area to
> +                        * until after CPU hardware completes the earlier
> +                        * checks and is ready to do VMLAUNCH/VMRESUME.
> +                        *
> +                        * In order to defer errors arising from invalid
> +                        * VM-entry MSR-load area in vmcs12, we set up a
> +                        * single invalid entry, which is illegal according
> +                        * to section "Loading MSRs in Intel SDM vol 3C, in
> +                        * VM-entry MSR-load area of vmcs02. This will cause
> +                        * the CPU hardware to VM-exit with "VM-entry
> +                        * failure due to MSR loading" after it completes
> +                        * checks on VMCS controls, host-state area and
> +                        * guest-state area.
> +                        */
> +                       vmx->loaded_vmcs->invalid_msr_load_area.index =
> +                           MSR_FS_BASE;

Can this field be statically populated during initialization?

> +                       vmx->loaded_vmcs->invalid_msr_load_area.value =
> +                           exit_qual;

This seems awkward. Why not save 16 bytes per loaded_vmcs by
allocating one invalid_msr_load_area system-wide and just add a 4 byte
field to struct nested_vmx to store this value?

> +                       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
> +                       vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
> +                           __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
> +               }

Do you need to set vmx->nested.dirty_vmcs12 to ensure that
prepare_vmcs02_constant_state() will be called at the next emulated
VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?

>         } else {
>                 /*
>                  * The MMU is not initialized to point at the right entities yet and
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 187d39bf0bf1..f3a384235b68 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
>  static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>                                             u32 exit_reason)
>  {
> +       u32 exit_qual;
>         u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
>
>         /*
>          * At this point, the exit interruption info in exit_intr_info
> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>                         vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
>         }
>
> -       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
> -                         vmcs_readl(EXIT_QUALIFICATION));
> +       exit_qual = vmcs_readl(EXIT_QUALIFICATION);
> +
> +       if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
> +           (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
> +                           EXIT_REASON_MSR_LOAD_FAIL))) {

Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
the second conjunct is true and the first is not?

> +               exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
> +       }
> +
> +       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
> +
>         return 1;
>  }
>
> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
> index 481ad879197b..e272788bd4b8 100644
> --- a/arch/x86/kvm/vmx/vmcs.h
> +++ b/arch/x86/kvm/vmx/vmcs.h
> @@ -70,6 +70,12 @@ struct loaded_vmcs {
>         struct list_head loaded_vmcss_on_cpu_link;
>         struct vmcs_host_state host_state;
>         struct vmcs_controls_shadow controls_shadow;
> +       /*
> +        * This field is used to set up an invalid VM-entry MSR-load area
> +        * for vmcs02 if an error is detected while processing the entries
> +        * in VM-entry MSR-load area of vmcs12.
> +        */
> +       struct vmx_msr_entry invalid_msr_load_area;
>  };

I'd suggest allocating just one invalid_msr_load_area system-wide, as
mentioned above.

>  static inline bool is_exception_n(u32 intr_info, u8 vector)
> --
> 2.20.1
>
Krish Sadhukhan Oct. 1, 2019, 9:21 p.m. UTC | #2
On 10/01/2019 10:09 AM, Jim Mattson wrote:
> On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com> wrote:
>> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
>> performed in a certain order. Checks on MSRs that are loaded on VM-entry
>> from VM-entry MSR-load area, should be done after verifying VMCS controls,
>> host-state area and guest-state area. As KVM relies on CPU hardware to
>> perform some of these checks, we need to defer VM-exit due to invalid
>> VM-entry MSR-load area to until after CPU hardware completes the earlier
>> checks and is ready to do VMLAUNCH/VMRESUME.
>>
>> In order to defer errors arising from invalid VM-entry MSR-load area in
>> vmcs12, we set up a single invalid entry, which is illegal according to
>> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
>> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
>> due to MSR loading" after it completes checks on VMCS controls, host-state
>> area and guest-state area. We reflect a synthesized Exit Qualification to
>> our guest.
> This change addresses the priority inversion, but it still potentially
> leaves guest MSRs incorrectly loaded with values from the VMCS12
> VM-entry MSR-load area when a higher priority error condition would
> have precluded any processing of the VM-entry MSR-load area.

May be, we should not load any guest MSR until we have checked the 
entire vmcs12 MSR-load area in nested_vmx_load_msr()  ?

>
>> Suggested-by: Jim Mattson <jmattson@google.com>
>> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
>> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> ---
>>   arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
>>   arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
>>   arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
>>   3 files changed, 49 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index ced9fba32598..b74491c04090 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>>                  goto vmentry_fail_vmexit_guest_mode;
>>
>>          if (from_vmentry) {
>> -               exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
>>                  exit_qual = nested_vmx_load_msr(vcpu,
>>                                                  vmcs12->vm_entry_msr_load_addr,
>>                                                  vmcs12->vm_entry_msr_load_count);
>> -               if (exit_qual)
>> -                       goto vmentry_fail_vmexit_guest_mode;
>> +               if (exit_qual) {
>> +                       /*
>> +                        * According to section “VM Entries” in Intel SDM
>> +                        * vol 3C, VM-entry checks are performed in a certain
>> +                        * order. Checks on MSRs that are loaded on VM-entry
>> +                        * from VM-entry MSR-load area, should be done after
>> +                        * verifying VMCS controls, host-state area and
>> +                        * guest-state area. As KVM relies on CPU hardware to
>> +                        * perform some of these checks, we need to defer
>> +                        * VM-exit due to invalid VM-entry MSR-load area to
>> +                        * until after CPU hardware completes the earlier
>> +                        * checks and is ready to do VMLAUNCH/VMRESUME.
>> +                        *
>> +                        * In order to defer errors arising from invalid
>> +                        * VM-entry MSR-load area in vmcs12, we set up a
>> +                        * single invalid entry, which is illegal according
>> +                        * to section "Loading MSRs in Intel SDM vol 3C, in
>> +                        * VM-entry MSR-load area of vmcs02. This will cause
>> +                        * the CPU hardware to VM-exit with "VM-entry
>> +                        * failure due to MSR loading" after it completes
>> +                        * checks on VMCS controls, host-state area and
>> +                        * guest-state area.
>> +                        */
>> +                       vmx->loaded_vmcs->invalid_msr_load_area.index =
>> +                           MSR_FS_BASE;
> Can this field be statically populated during initialization?

Yes.

>
>> +                       vmx->loaded_vmcs->invalid_msr_load_area.value =
>> +                           exit_qual;
> This seems awkward. Why not save 16 bytes per loaded_vmcs by
> allocating one invalid_msr_load_area system-wide and just add a 4 byte
> field to struct nested_vmx to store this value?

OK.

>
>> +                       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
>> +                       vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
>> +                           __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
>> +               }
> Do you need to set vmx->nested.dirty_vmcs12 to ensure that
> prepare_vmcs02_constant_state() will be called at the next emulated
> VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?

Yes.

>
>>          } else {
>>                  /*
>>                   * The MMU is not initialized to point at the right entities yet and
>> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
>> index 187d39bf0bf1..f3a384235b68 100644
>> --- a/arch/x86/kvm/vmx/nested.h
>> +++ b/arch/x86/kvm/vmx/nested.h
>> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
>>   static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>>                                              u32 exit_reason)
>>   {
>> +       u32 exit_qual;
>>          u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
>> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
>>
>>          /*
>>           * At this point, the exit interruption info in exit_intr_info
>> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>>                          vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
>>          }
>>
>> -       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
>> -                         vmcs_readl(EXIT_QUALIFICATION));
>> +       exit_qual = vmcs_readl(EXIT_QUALIFICATION);
>> +
>> +       if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
>> +           (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
>> +                           EXIT_REASON_MSR_LOAD_FAIL))) {
> Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
> the second conjunct is true and the first is not?

If the first conjunct isn't true and the second one is, it means it's a 
hardware-detected MSR-load error. Right ? So won't that be handled the 
same way it is handled currently in KVM ?

>
>> +               exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
>> +       }
>> +
>> +       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
>> +
>>          return 1;
>>   }
>>
>> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
>> index 481ad879197b..e272788bd4b8 100644
>> --- a/arch/x86/kvm/vmx/vmcs.h
>> +++ b/arch/x86/kvm/vmx/vmcs.h
>> @@ -70,6 +70,12 @@ struct loaded_vmcs {
>>          struct list_head loaded_vmcss_on_cpu_link;
>>          struct vmcs_host_state host_state;
>>          struct vmcs_controls_shadow controls_shadow;
>> +       /*
>> +        * This field is used to set up an invalid VM-entry MSR-load area
>> +        * for vmcs02 if an error is detected while processing the entries
>> +        * in VM-entry MSR-load area of vmcs12.
>> +        */
>> +       struct vmx_msr_entry invalid_msr_load_area;
>>   };
> I'd suggest allocating just one invalid_msr_load_area system-wide, as
> mentioned above.
>
>>   static inline bool is_exception_n(u32 intr_info, u8 vector)
>> --
>> 2.20.1
>>
Jim Mattson Oct. 1, 2019, 10:02 p.m. UTC | #3
On Tue, Oct 1, 2019 at 2:23 PM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
>
> On 10/01/2019 10:09 AM, Jim Mattson wrote:
> > On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com> wrote:
> >> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
> >> performed in a certain order. Checks on MSRs that are loaded on VM-entry
> >> from VM-entry MSR-load area, should be done after verifying VMCS controls,
> >> host-state area and guest-state area. As KVM relies on CPU hardware to
> >> perform some of these checks, we need to defer VM-exit due to invalid
> >> VM-entry MSR-load area to until after CPU hardware completes the earlier
> >> checks and is ready to do VMLAUNCH/VMRESUME.
> >>
> >> In order to defer errors arising from invalid VM-entry MSR-load area in
> >> vmcs12, we set up a single invalid entry, which is illegal according to
> >> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
> >> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
> >> due to MSR loading" after it completes checks on VMCS controls, host-state
> >> area and guest-state area. We reflect a synthesized Exit Qualification to
> >> our guest.
> > This change addresses the priority inversion, but it still potentially
> > leaves guest MSRs incorrectly loaded with values from the VMCS12
> > VM-entry MSR-load area when a higher priority error condition would
> > have precluded any processing of the VM-entry MSR-load area.
>
> May be, we should not load any guest MSR until we have checked the
> entire vmcs12 MSR-load area in nested_vmx_load_msr()  ?

That's not sufficient. We shouldn't load any guest MSR until all of
the checks on host state, controls, and guest state have passed. Since
we defer the guest state checks to hardware, that makes the proper
ordering a bit problematic. Instead, you can try to arrange to undo
the guest MSR loads if a higher priority error is discovered. That
won't be trivial.

> >
> >> Suggested-by: Jim Mattson <jmattson@google.com>
> >> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
> >> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
> >> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> >> ---
> >>   arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
> >>   arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
> >>   arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
> >>   3 files changed, 49 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >> index ced9fba32598..b74491c04090 100644
> >> --- a/arch/x86/kvm/vmx/nested.c
> >> +++ b/arch/x86/kvm/vmx/nested.c
> >> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
> >>                  goto vmentry_fail_vmexit_guest_mode;
> >>
> >>          if (from_vmentry) {
> >> -               exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> >>                  exit_qual = nested_vmx_load_msr(vcpu,
> >>                                                  vmcs12->vm_entry_msr_load_addr,
> >>                                                  vmcs12->vm_entry_msr_load_count);
> >> -               if (exit_qual)
> >> -                       goto vmentry_fail_vmexit_guest_mode;
> >> +               if (exit_qual) {
> >> +                       /*
> >> +                        * According to section “VM Entries” in Intel SDM
> >> +                        * vol 3C, VM-entry checks are performed in a certain
> >> +                        * order. Checks on MSRs that are loaded on VM-entry
> >> +                        * from VM-entry MSR-load area, should be done after
> >> +                        * verifying VMCS controls, host-state area and
> >> +                        * guest-state area. As KVM relies on CPU hardware to
> >> +                        * perform some of these checks, we need to defer
> >> +                        * VM-exit due to invalid VM-entry MSR-load area to
> >> +                        * until after CPU hardware completes the earlier
> >> +                        * checks and is ready to do VMLAUNCH/VMRESUME.
> >> +                        *
> >> +                        * In order to defer errors arising from invalid
> >> +                        * VM-entry MSR-load area in vmcs12, we set up a
> >> +                        * single invalid entry, which is illegal according
> >> +                        * to section "Loading MSRs in Intel SDM vol 3C, in
> >> +                        * VM-entry MSR-load area of vmcs02. This will cause
> >> +                        * the CPU hardware to VM-exit with "VM-entry
> >> +                        * failure due to MSR loading" after it completes
> >> +                        * checks on VMCS controls, host-state area and
> >> +                        * guest-state area.
> >> +                        */
> >> +                       vmx->loaded_vmcs->invalid_msr_load_area.index =
> >> +                           MSR_FS_BASE;
> > Can this field be statically populated during initialization?
>
> Yes.
>
> >
> >> +                       vmx->loaded_vmcs->invalid_msr_load_area.value =
> >> +                           exit_qual;
> > This seems awkward. Why not save 16 bytes per loaded_vmcs by
> > allocating one invalid_msr_load_area system-wide and just add a 4 byte
> > field to struct nested_vmx to store this value?
>
> OK.
>
> >
> >> +                       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
> >> +                       vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
> >> +                           __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
> >> +               }
> > Do you need to set vmx->nested.dirty_vmcs12 to ensure that
> > prepare_vmcs02_constant_state() will be called at the next emulated
> > VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?
>
> Yes.
>
> >
> >>          } else {
> >>                  /*
> >>                   * The MMU is not initialized to point at the right entities yet and
> >> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> >> index 187d39bf0bf1..f3a384235b68 100644
> >> --- a/arch/x86/kvm/vmx/nested.h
> >> +++ b/arch/x86/kvm/vmx/nested.h
> >> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
> >>   static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
> >>                                              u32 exit_reason)
> >>   {
> >> +       u32 exit_qual;
> >>          u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> >> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>
> >>          /*
> >>           * At this point, the exit interruption info in exit_intr_info
> >> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
> >>                          vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
> >>          }
> >>
> >> -       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
> >> -                         vmcs_readl(EXIT_QUALIFICATION));
> >> +       exit_qual = vmcs_readl(EXIT_QUALIFICATION);
> >> +
> >> +       if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
> >> +           (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
> >> +                           EXIT_REASON_MSR_LOAD_FAIL))) {
> > Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
> > the second conjunct is true and the first is not?
>
> If the first conjunct isn't true and the second one is, it means it's a
> hardware-detected MSR-load error. Right ? So won't that be handled the
> same way it is handled currently in KVM ?

I believe that KVM will currently forward such an error to L1, which
is wrong, since L1 has no control over vmcs02's VM-entry MSR-load
area. The assumption is that this will never happen, because L0 is too
careful.

> >
> >> +               exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
> >> +       }
> >> +
> >> +       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
> >> +
> >>          return 1;
> >>   }
> >>
> >> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
> >> index 481ad879197b..e272788bd4b8 100644
> >> --- a/arch/x86/kvm/vmx/vmcs.h
> >> +++ b/arch/x86/kvm/vmx/vmcs.h
> >> @@ -70,6 +70,12 @@ struct loaded_vmcs {
> >>          struct list_head loaded_vmcss_on_cpu_link;
> >>          struct vmcs_host_state host_state;
> >>          struct vmcs_controls_shadow controls_shadow;
> >> +       /*
> >> +        * This field is used to set up an invalid VM-entry MSR-load area
> >> +        * for vmcs02 if an error is detected while processing the entries
> >> +        * in VM-entry MSR-load area of vmcs12.
> >> +        */
> >> +       struct vmx_msr_entry invalid_msr_load_area;
> >>   };
> > I'd suggest allocating just one invalid_msr_load_area system-wide, as
> > mentioned above.
> >
> >>   static inline bool is_exception_n(u32 intr_info, u8 vector)
> >> --
> >> 2.20.1
> >>
>
Krish Sadhukhan Oct. 2, 2019, 5:16 p.m. UTC | #4
On 10/1/19 3:02 PM, Jim Mattson wrote:
> On Tue, Oct 1, 2019 at 2:23 PM Krish Sadhukhan
> <krish.sadhukhan@oracle.com>  wrote:
>>
>> On 10/01/2019 10:09 AM, Jim Mattson wrote:
>>> On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
>>> <krish.sadhukhan@oracle.com>  wrote:
>>>> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
>>>> performed in a certain order. Checks on MSRs that are loaded on VM-entry
>>>> from VM-entry MSR-load area, should be done after verifying VMCS controls,
>>>> host-state area and guest-state area. As KVM relies on CPU hardware to
>>>> perform some of these checks, we need to defer VM-exit due to invalid
>>>> VM-entry MSR-load area to until after CPU hardware completes the earlier
>>>> checks and is ready to do VMLAUNCH/VMRESUME.
>>>>
>>>> In order to defer errors arising from invalid VM-entry MSR-load area in
>>>> vmcs12, we set up a single invalid entry, which is illegal according to
>>>> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
>>>> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
>>>> due to MSR loading" after it completes checks on VMCS controls, host-state
>>>> area and guest-state area. We reflect a synthesized Exit Qualification to
>>>> our guest.
>>> This change addresses the priority inversion, but it still potentially
>>> leaves guest MSRs incorrectly loaded with values from the VMCS12
>>> VM-entry MSR-load area when a higher priority error condition would
>>> have precluded any processing of the VM-entry MSR-load area.
>> May be, we should not load any guest MSR until we have checked the
>> entire vmcs12 MSR-load area in nested_vmx_load_msr()  ?
> That's not sufficient. We shouldn't load any guest MSR until all of
> the checks on host state, controls, and guest state have passed. Since
> we defer the guest state checks to hardware, that makes the proper
> ordering a bit problematic. Instead, you can try to arrange to undo
> the guest MSR loads if a higher priority error is discovered. That
> won't be trivial.


We discussed about undoing guest MSR loads in an earlier thread. You 
said that some MSR updates couldn't be rolled back. If we can't rollback 
some MSR updates and if that leads to an undefined guest configuration, 
may be we should return to L0 and mark it as a hardware failure, like 
what we currently do for some VM-entry failures in vmx_handle_exit():


     if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
                 dump_vmcs();
                 vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu->run->fail_entry.hardware_entry_failure_reason
                         = exit_reason;
                 return 0;
         }


>>>> Suggested-by: Jim Mattson<jmattson@google.com>
>>>> Signed-off-by: Krish Sadhukhan<krish.sadhukhan@oracle.com>
>>>> Reviewed-by: Mihai Carabas<mihai.carabas@oracle.com>
>>>> Reviewed-by: Liran Alon<liran.alon@oracle.com>
>>>> ---
>>>>    arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
>>>>    arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
>>>>    arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
>>>>    3 files changed, 49 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>>>> index ced9fba32598..b74491c04090 100644
>>>> --- a/arch/x86/kvm/vmx/nested.c
>>>> +++ b/arch/x86/kvm/vmx/nested.c
>>>> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>>>>                   goto vmentry_fail_vmexit_guest_mode;
>>>>
>>>>           if (from_vmentry) {
>>>> -               exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
>>>>                   exit_qual = nested_vmx_load_msr(vcpu,
>>>>                                                   vmcs12->vm_entry_msr_load_addr,
>>>>                                                   vmcs12->vm_entry_msr_load_count);
>>>> -               if (exit_qual)
>>>> -                       goto vmentry_fail_vmexit_guest_mode;
>>>> +               if (exit_qual) {
>>>> +                       /*
>>>> +                        * According to section “VM Entries” in Intel SDM
>>>> +                        * vol 3C, VM-entry checks are performed in a certain
>>>> +                        * order. Checks on MSRs that are loaded on VM-entry
>>>> +                        * from VM-entry MSR-load area, should be done after
>>>> +                        * verifying VMCS controls, host-state area and
>>>> +                        * guest-state area. As KVM relies on CPU hardware to
>>>> +                        * perform some of these checks, we need to defer
>>>> +                        * VM-exit due to invalid VM-entry MSR-load area to
>>>> +                        * until after CPU hardware completes the earlier
>>>> +                        * checks and is ready to do VMLAUNCH/VMRESUME.
>>>> +                        *
>>>> +                        * In order to defer errors arising from invalid
>>>> +                        * VM-entry MSR-load area in vmcs12, we set up a
>>>> +                        * single invalid entry, which is illegal according
>>>> +                        * to section "Loading MSRs in Intel SDM vol 3C, in
>>>> +                        * VM-entry MSR-load area of vmcs02. This will cause
>>>> +                        * the CPU hardware to VM-exit with "VM-entry
>>>> +                        * failure due to MSR loading" after it completes
>>>> +                        * checks on VMCS controls, host-state area and
>>>> +                        * guest-state area.
>>>> +                        */
>>>> +                       vmx->loaded_vmcs->invalid_msr_load_area.index =
>>>> +                           MSR_FS_BASE;
>>> Can this field be statically populated during initialization?
>> Yes.
>>
>>>> +                       vmx->loaded_vmcs->invalid_msr_load_area.value =
>>>> +                           exit_qual;
>>> This seems awkward. Why not save 16 bytes per loaded_vmcs by
>>> allocating one invalid_msr_load_area system-wide and just add a 4 byte
>>> field to struct nested_vmx to store this value?
>> OK.
>>
>>>> +                       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
>>>> +                       vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
>>>> +                           __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
>>>> +               }
>>> Do you need to set vmx->nested.dirty_vmcs12 to ensure that
>>> prepare_vmcs02_constant_state() will be called at the next emulated
>>> VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?
>> Yes.
>>
>>>>           } else {
>>>>                   /*
>>>>                    * The MMU is not initialized to point at the right entities yet and
>>>> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
>>>> index 187d39bf0bf1..f3a384235b68 100644
>>>> --- a/arch/x86/kvm/vmx/nested.h
>>>> +++ b/arch/x86/kvm/vmx/nested.h
>>>> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
>>>>    static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>>>>                                               u32 exit_reason)
>>>>    {
>>>> +       u32 exit_qual;
>>>>           u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
>>>> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>
>>>>           /*
>>>>            * At this point, the exit interruption info in exit_intr_info
>>>> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
>>>>                           vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
>>>>           }
>>>>
>>>> -       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
>>>> -                         vmcs_readl(EXIT_QUALIFICATION));
>>>> +       exit_qual = vmcs_readl(EXIT_QUALIFICATION);
>>>> +
>>>> +       if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
>>>> +           (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
>>>> +                           EXIT_REASON_MSR_LOAD_FAIL))) {
>>> Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
>>> the second conjunct is true and the first is not?
>> If the first conjunct isn't true and the second one is, it means it's a
>> hardware-detected MSR-load error. Right ? So won't that be handled the
>> same way it is handled currently in KVM ?
> I believe that KVM will currently forward such an error to L1, which
> is wrong, since L1 has no control over vmcs02's VM-entry MSR-load
> area.


You are right. I hadn't noticed it until you mentioned.


>   The assumption is that this will never happen, because L0 is too
> careful.
>>>> +               exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
>>>> +       }
>>>> +
>>>> +       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
>>>> +
>>>>           return 1;
>>>>    }
>>>>
>>>> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
>>>> index 481ad879197b..e272788bd4b8 100644
>>>> --- a/arch/x86/kvm/vmx/vmcs.h
>>>> +++ b/arch/x86/kvm/vmx/vmcs.h
>>>> @@ -70,6 +70,12 @@ struct loaded_vmcs {
>>>>           struct list_head loaded_vmcss_on_cpu_link;
>>>>           struct vmcs_host_state host_state;
>>>>           struct vmcs_controls_shadow controls_shadow;
>>>> +       /*
>>>> +        * This field is used to set up an invalid VM-entry MSR-load area
>>>> +        * for vmcs02 if an error is detected while processing the entries
>>>> +        * in VM-entry MSR-load area of vmcs12.
>>>> +        */
>>>> +       struct vmx_msr_entry invalid_msr_load_area;
>>>>    };
>>> I'd suggest allocating just one invalid_msr_load_area system-wide, as
>>> mentioned above.
>>>
>>>>    static inline bool is_exception_n(u32 intr_info, u8 vector)
>>>> --
>>>> 2.20.1
>>>>
Jim Mattson Oct. 2, 2019, 5:32 p.m. UTC | #5
On Wed, Oct 2, 2019 at 10:16 AM Krish Sadhukhan
<krish.sadhukhan@oracle.com> wrote:
>
>
> On 10/1/19 3:02 PM, Jim Mattson wrote:
> > On Tue, Oct 1, 2019 at 2:23 PM Krish Sadhukhan
> > <krish.sadhukhan@oracle.com>  wrote:
> >>
> >> On 10/01/2019 10:09 AM, Jim Mattson wrote:
> >>> On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
> >>> <krish.sadhukhan@oracle.com>  wrote:
> >>>> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
> >>>> performed in a certain order. Checks on MSRs that are loaded on VM-entry
> >>>> from VM-entry MSR-load area, should be done after verifying VMCS controls,
> >>>> host-state area and guest-state area. As KVM relies on CPU hardware to
> >>>> perform some of these checks, we need to defer VM-exit due to invalid
> >>>> VM-entry MSR-load area to until after CPU hardware completes the earlier
> >>>> checks and is ready to do VMLAUNCH/VMRESUME.
> >>>>
> >>>> In order to defer errors arising from invalid VM-entry MSR-load area in
> >>>> vmcs12, we set up a single invalid entry, which is illegal according to
> >>>> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
> >>>> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
> >>>> due to MSR loading" after it completes checks on VMCS controls, host-state
> >>>> area and guest-state area. We reflect a synthesized Exit Qualification to
> >>>> our guest.
> >>> This change addresses the priority inversion, but it still potentially
> >>> leaves guest MSRs incorrectly loaded with values from the VMCS12
> >>> VM-entry MSR-load area when a higher priority error condition would
> >>> have precluded any processing of the VM-entry MSR-load area.
> >> May be, we should not load any guest MSR until we have checked the
> >> entire vmcs12 MSR-load area in nested_vmx_load_msr()  ?
> > That's not sufficient. We shouldn't load any guest MSR until all of
> > the checks on host state, controls, and guest state have passed. Since
> > we defer the guest state checks to hardware, that makes the proper
> > ordering a bit problematic. Instead, you can try to arrange to undo
> > the guest MSR loads if a higher priority error is discovered. That
> > won't be trivial.
>
>
> We discussed about undoing guest MSR loads in an earlier thread. You
> said that some MSR updates couldn't be rolled back. If we can't rollback
> some MSR updates and if that leads to an undefined guest configuration,
> may be we should return to L0 and mark it as a hardware failure, like
> what we currently do for some VM-entry failures in vmx_handle_exit():

The MSR loads that can't be rolled back are mainly the ones with
side-effects, like IA32_PRED_CMD. I think these are mostly benign
(except, perhaps, from a performance perspective). I think it's worth
investigating rollback as a partial solution to this problem.

>
>      if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
>                  dump_vmcs();
>                  vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;
> vcpu->run->fail_entry.hardware_entry_failure_reason
>                          = exit_reason;
>                  return 0;
>          }
>
>
> >>>> Suggested-by: Jim Mattson<jmattson@google.com>
> >>>> Signed-off-by: Krish Sadhukhan<krish.sadhukhan@oracle.com>
> >>>> Reviewed-by: Mihai Carabas<mihai.carabas@oracle.com>
> >>>> Reviewed-by: Liran Alon<liran.alon@oracle.com>
> >>>> ---
> >>>>    arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
> >>>>    arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
> >>>>    arch/x86/kvm/vmx/vmcs.h   |  6 ++++++
> >>>>    3 files changed, 49 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> >>>> index ced9fba32598..b74491c04090 100644
> >>>> --- a/arch/x86/kvm/vmx/nested.c
> >>>> +++ b/arch/x86/kvm/vmx/nested.c
> >>>> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
> >>>>                   goto vmentry_fail_vmexit_guest_mode;
> >>>>
> >>>>           if (from_vmentry) {
> >>>> -               exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> >>>>                   exit_qual = nested_vmx_load_msr(vcpu,
> >>>>                                                   vmcs12->vm_entry_msr_load_addr,
> >>>>                                                   vmcs12->vm_entry_msr_load_count);
> >>>> -               if (exit_qual)
> >>>> -                       goto vmentry_fail_vmexit_guest_mode;
> >>>> +               if (exit_qual) {
> >>>> +                       /*
> >>>> +                        * According to section “VM Entries” in Intel SDM
> >>>> +                        * vol 3C, VM-entry checks are performed in a certain
> >>>> +                        * order. Checks on MSRs that are loaded on VM-entry
> >>>> +                        * from VM-entry MSR-load area, should be done after
> >>>> +                        * verifying VMCS controls, host-state area and
> >>>> +                        * guest-state area. As KVM relies on CPU hardware to
> >>>> +                        * perform some of these checks, we need to defer
> >>>> +                        * VM-exit due to invalid VM-entry MSR-load area to
> >>>> +                        * until after CPU hardware completes the earlier
> >>>> +                        * checks and is ready to do VMLAUNCH/VMRESUME.
> >>>> +                        *
> >>>> +                        * In order to defer errors arising from invalid
> >>>> +                        * VM-entry MSR-load area in vmcs12, we set up a
> >>>> +                        * single invalid entry, which is illegal according
> >>>> +                        * to section "Loading MSRs in Intel SDM vol 3C, in
> >>>> +                        * VM-entry MSR-load area of vmcs02. This will cause
> >>>> +                        * the CPU hardware to VM-exit with "VM-entry
> >>>> +                        * failure due to MSR loading" after it completes
> >>>> +                        * checks on VMCS controls, host-state area and
> >>>> +                        * guest-state area.
> >>>> +                        */
> >>>> +                       vmx->loaded_vmcs->invalid_msr_load_area.index =
> >>>> +                           MSR_FS_BASE;
> >>> Can this field be statically populated during initialization?
> >> Yes.
> >>
> >>>> +                       vmx->loaded_vmcs->invalid_msr_load_area.value =
> >>>> +                           exit_qual;
> >>> This seems awkward. Why not save 16 bytes per loaded_vmcs by
> >>> allocating one invalid_msr_load_area system-wide and just add a 4 byte
> >>> field to struct nested_vmx to store this value?
> >> OK.
> >>
> >>>> +                       vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
> >>>> +                       vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
> >>>> +                           __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
> >>>> +               }
> >>> Do you need to set vmx->nested.dirty_vmcs12 to ensure that
> >>> prepare_vmcs02_constant_state() will be called at the next emulated
> >>> VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?
> >> Yes.
> >>
> >>>>           } else {
> >>>>                   /*
> >>>>                    * The MMU is not initialized to point at the right entities yet and
> >>>> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> >>>> index 187d39bf0bf1..f3a384235b68 100644
> >>>> --- a/arch/x86/kvm/vmx/nested.h
> >>>> +++ b/arch/x86/kvm/vmx/nested.h
> >>>> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
> >>>>    static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
> >>>>                                               u32 exit_reason)
> >>>>    {
> >>>> +       u32 exit_qual;
> >>>>           u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> >>>> +       struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>>
> >>>>           /*
> >>>>            * At this point, the exit interruption info in exit_intr_info
> >>>> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
> >>>>                           vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
> >>>>           }
> >>>>
> >>>> -       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
> >>>> -                         vmcs_readl(EXIT_QUALIFICATION));
> >>>> +       exit_qual = vmcs_readl(EXIT_QUALIFICATION);
> >>>> +
> >>>> +       if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
> >>>> +           (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
> >>>> +                           EXIT_REASON_MSR_LOAD_FAIL))) {
> >>> Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
> >>> the second conjunct is true and the first is not?
> >> If the first conjunct isn't true and the second one is, it means it's a
> >> hardware-detected MSR-load error. Right ? So won't that be handled the
> >> same way it is handled currently in KVM ?
> > I believe that KVM will currently forward such an error to L1, which
> > is wrong, since L1 has no control over vmcs02's VM-entry MSR-load
> > area.
>
>
> You are right. I hadn't noticed it until you mentioned.
>
>
> >   The assumption is that this will never happen, because L0 is too
> > careful.
> >>>> +               exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
> >>>> +       }
> >>>> +
> >>>> +       nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
> >>>> +
> >>>>           return 1;
> >>>>    }
> >>>>
> >>>> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
> >>>> index 481ad879197b..e272788bd4b8 100644
> >>>> --- a/arch/x86/kvm/vmx/vmcs.h
> >>>> +++ b/arch/x86/kvm/vmx/vmcs.h
> >>>> @@ -70,6 +70,12 @@ struct loaded_vmcs {
> >>>>           struct list_head loaded_vmcss_on_cpu_link;
> >>>>           struct vmcs_host_state host_state;
> >>>>           struct vmcs_controls_shadow controls_shadow;
> >>>> +       /*
> >>>> +        * This field is used to set up an invalid VM-entry MSR-load area
> >>>> +        * for vmcs02 if an error is detected while processing the entries
> >>>> +        * in VM-entry MSR-load area of vmcs12.
> >>>> +        */
> >>>> +       struct vmx_msr_entry invalid_msr_load_area;
> >>>>    };
> >>> I'd suggest allocating just one invalid_msr_load_area system-wide, as
> >>> mentioned above.
> >>>
> >>>>    static inline bool is_exception_n(u32 intr_info, u8 vector)
> >>>> --
> >>>> 2.20.1
> >>>>

Patch
diff mbox series

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ced9fba32598..b74491c04090 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3054,12 +3054,40 @@  int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
 		goto vmentry_fail_vmexit_guest_mode;
 
 	if (from_vmentry) {
-		exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
 		exit_qual = nested_vmx_load_msr(vcpu,
 						vmcs12->vm_entry_msr_load_addr,
 						vmcs12->vm_entry_msr_load_count);
-		if (exit_qual)
-			goto vmentry_fail_vmexit_guest_mode;
+		if (exit_qual) {
+			/*
+			 * According to section “VM Entries” in Intel SDM
+			 * vol 3C, VM-entry checks are performed in a certain
+			 * order. Checks on MSRs that are loaded on VM-entry
+			 * from VM-entry MSR-load area, should be done after
+			 * verifying VMCS controls, host-state area and
+			 * guest-state area. As KVM relies on CPU hardware to
+			 * perform some of these checks, we need to defer
+			 * VM-exit due to invalid VM-entry MSR-load area to
+			 * until after CPU hardware completes the earlier
+			 * checks and is ready to do VMLAUNCH/VMRESUME.
+			 *
+			 * In order to defer errors arising from invalid
+			 * VM-entry MSR-load area in vmcs12, we set up a
+			 * single invalid entry, which is illegal according
+			 * to section "Loading MSRs in Intel SDM vol 3C, in
+			 * VM-entry MSR-load area of vmcs02. This will cause
+			 * the CPU hardware to VM-exit with "VM-entry
+			 * failure due to MSR loading" after it completes
+			 * checks on VMCS controls, host-state area and
+			 * guest-state area.
+			 */
+			vmx->loaded_vmcs->invalid_msr_load_area.index =
+			    MSR_FS_BASE;
+			vmx->loaded_vmcs->invalid_msr_load_area.value =
+			    exit_qual;
+			vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
+			vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
+			    __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
+		}
 	} else {
 		/*
 		 * The MMU is not initialized to point at the right entities yet and
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 187d39bf0bf1..f3a384235b68 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -64,7 +64,9 @@  static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
 static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
 					    u32 exit_reason)
 {
+	u32 exit_qual;
 	u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
 	/*
 	 * At this point, the exit interruption info in exit_intr_info
@@ -81,8 +83,16 @@  static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
 			vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
 	}
 
-	nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
-			  vmcs_readl(EXIT_QUALIFICATION));
+	exit_qual = vmcs_readl(EXIT_QUALIFICATION);
+
+	if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
+	    (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
+			    EXIT_REASON_MSR_LOAD_FAIL))) {
+		exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
+	}
+
+	nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
+
 	return 1;
 }
 
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index 481ad879197b..e272788bd4b8 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -70,6 +70,12 @@  struct loaded_vmcs {
 	struct list_head loaded_vmcss_on_cpu_link;
 	struct vmcs_host_state host_state;
 	struct vmcs_controls_shadow controls_shadow;
+	/*
+	 * This field is used to set up an invalid VM-entry MSR-load area
+	 * for vmcs02 if an error is detected while processing the entries
+	 * in VM-entry MSR-load area of vmcs12.
+	 */
+	struct vmx_msr_entry invalid_msr_load_area;
 };
 
 static inline bool is_exception_n(u32 intr_info, u8 vector)