diff mbox series

[v3,10/21] KVM:x86: Add #CP support in guest exception classification

Message ID 20230511040857.6094-11-weijiang.yang@intel.com (mailing list archive)
State New, archived
Headers show
Series Enable CET Virtualization | expand

Commit Message

Yang, Weijiang May 11, 2023, 4:08 a.m. UTC
Add handling for Control Protection (#CP) exceptions(vector 21).
The new vector is introduced for Intel's Control-Flow Enforcement
Technology (CET) relevant violation cases.

Although #CP belongs contributory exception class, but the actual
effect is conditional on CET being exposed to guest. If CET is not
available to guest, #CP falls back to non-contributory and doesn't
have an error code. The rational is used to fix one unit test failure
encountered in L1. Although the issue now is fixed in unit test case,
keep the handling is reasonable. cr4_guest_rsvd_bits is used to avoid
guest_cpuid_has() lookups.

Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
---
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/vmx/nested.c       |  2 +-
 arch/x86/kvm/x86.c              | 10 +++++++---
 arch/x86/kvm/x86.h              | 13 ++++++++++---
 4 files changed, 19 insertions(+), 7 deletions(-)

Comments

Chao Gao June 6, 2023, 9:08 a.m. UTC | #1
On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>Add handling for Control Protection (#CP) exceptions(vector 21).
>The new vector is introduced for Intel's Control-Flow Enforcement
>Technology (CET) relevant violation cases.
>

>Although #CP belongs contributory exception class, but the actual
>effect is conditional on CET being exposed to guest. If CET is not
>available to guest, #CP falls back to non-contributory and doesn't
>have an error code.

This sounds weird. is this the hardware behavior? If yes, could you
point us to where this behavior is documented?
Yang, Weijiang June 8, 2023, 6:01 a.m. UTC | #2
On 6/6/2023 5:08 PM, Chao Gao wrote:
> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>> Add handling for Control Protection (#CP) exceptions(vector 21).
>> The new vector is introduced for Intel's Control-Flow Enforcement
>> Technology (CET) relevant violation cases.
>>
>> Although #CP belongs contributory exception class, but the actual
>> effect is conditional on CET being exposed to guest. If CET is not
>> available to guest, #CP falls back to non-contributory and doesn't
>> have an error code.
> This sounds weird. is this the hardware behavior? If yes, could you
> point us to where this behavior is documented?

It's not SDM documented behavior.

The original description is provided by Sean here:

Re: [PATCH v15 04/14] KVM: x86: Add #CP support in guest exception 
dispatch - Sean Christopherson (kernel.org) 
<https://lore.kernel.org/all/YBsZwvwhshw+s7yQ@google.com/>

I also verified the issue on my side.  If the KVM CET patches are there 
in L1 but CET is not enabled, and running some unit test can

trigger unit test failure although the #CP induced one has been fixed in 
KVM unit tests.
Sean Christopherson June 15, 2023, 11:58 p.m. UTC | #3
On Thu, Jun 08, 2023, Weijiang Yang wrote:
> 
> On 6/6/2023 5:08 PM, Chao Gao wrote:
> > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
> > > Add handling for Control Protection (#CP) exceptions(vector 21).
> > > The new vector is introduced for Intel's Control-Flow Enforcement
> > > Technology (CET) relevant violation cases.
> > > 
> > > Although #CP belongs contributory exception class, but the actual
> > > effect is conditional on CET being exposed to guest. If CET is not
> > > available to guest, #CP falls back to non-contributory and doesn't
> > > have an error code.
> > This sounds weird. is this the hardware behavior? If yes, could you
> > point us to where this behavior is documented?
> 
> It's not SDM documented behavior.

The #CP behavior needs to be documented.  Please pester whoever you need to in
order to make that happen.
Yang, Weijiang June 16, 2023, 6:56 a.m. UTC | #4
On 6/16/2023 7:58 AM, Sean Christopherson wrote:
> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>> Technology (CET) relevant violation cases.
>>>>
>>>> Although #CP belongs contributory exception class, but the actual
>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>> have an error code.
>>> This sounds weird. is this the hardware behavior? If yes, could you
>>> point us to where this behavior is documented?
>> It's not SDM documented behavior.
> The #CP behavior needs to be documented.  Please pester whoever you need to in
> order to make that happen.

Do you mean documentation for #CP as an generic exception or the 
behavior in KVM as

this patch shows?
Sean Christopherson June 16, 2023, 6:57 p.m. UTC | #5
On Fri, Jun 16, 2023, Weijiang Yang wrote:
> 
> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
> > On Thu, Jun 08, 2023, Weijiang Yang wrote:
> > > On 6/6/2023 5:08 PM, Chao Gao wrote:
> > > > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
> > > > > Add handling for Control Protection (#CP) exceptions(vector 21).
> > > > > The new vector is introduced for Intel's Control-Flow Enforcement
> > > > > Technology (CET) relevant violation cases.
> > > > > 
> > > > > Although #CP belongs contributory exception class, but the actual
> > > > > effect is conditional on CET being exposed to guest. If CET is not
> > > > > available to guest, #CP falls back to non-contributory and doesn't
> > > > > have an error code.
> > > > This sounds weird. is this the hardware behavior? If yes, could you
> > > > point us to where this behavior is documented?
> > > It's not SDM documented behavior.
> > The #CP behavior needs to be documented.  Please pester whoever you need to in
> > order to make that happen.
> 
> Do you mean documentation for #CP as an generic exception or the behavior in
> KVM as this patch shows?

As I pointed out two *years* ago, this entry in the SDM

  — The field's deliver-error-code bit (bit 11) is 1 if each of the following
    holds: (1) the interruption type is hardware exception; (2) bit 0
    (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
    indicates one of the following exceptions: #DF (vector 8), #TS (10),
    #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).

needs to read something like

  — The field's deliver-error-code bit (bit 11) is 1 if each of the following
    holds: (1) the interruption type is hardware exception; (2) bit 0
    (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
    indicates one of the following exceptions: #DF (vector 8), #TS (10),
    #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]

    [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
        support for the 1-setting of CR4.CET.
Yang, Weijiang June 19, 2023, 9:28 a.m. UTC | #6
On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>> Technology (CET) relevant violation cases.
>>>>>>
>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>> have an error code.
>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>> point us to where this behavior is documented?
>>>> It's not SDM documented behavior.
>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>> order to make that happen.
>> Do you mean documentation for #CP as an generic exception or the behavior in
>> KVM as this patch shows?
> As I pointed out two *years* ago, this entry in the SDM
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>
> needs to read something like
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>
>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>          support for the 1-setting of CR4.CET.

OK, I'll route the messages to related person, thanks!
Yang, Weijiang June 30, 2023, 9:34 a.m. UTC | #7
On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>> Technology (CET) relevant violation cases.
>>>>>>
>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>> have an error code.
>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>> point us to where this behavior is documented?
>>>> It's not SDM documented behavior.
>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>> order to make that happen.
>> Do you mean documentation for #CP as an generic exception or the behavior in
>> KVM as this patch shows?
> As I pointed out two *years* ago, this entry in the SDM
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>
> needs to read something like
>
>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>      holds: (1) the interruption type is hardware exception; (2) bit 0
>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>
>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>          support for the 1-setting of CR4.CET.

Hi, Sean,

I sent above change request to Gil(added in cc), but he shared different 
opinion on this issue:


"It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] 
as 1.

  However, there were earlier parts without CET that enumerated 
IA32_VMX_BASIC[56] as 0.

  On those parts, an attempt to inject an exception with vector 21 (#CP) 
with an error code would fail.

(Injection of exception 21 with no error code would be allowed.)

  It may make things clearer if we document the statement above (all 
CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).

I will see if we can update future revisions of the SDM to clarify this."


Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before 
inject exception to nested VM.

And this patch could be removed, instead need another patch like below:

diff --git a/arch/x86/include/asm/msr-index.h 
b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..6b33aacc8587 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1076,6 +1076,7 @@
  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
  #define VMX_BASIC_MEM_TYPE_WB    6LLU
  #define VMX_BASIC_INOUT        0x0040000000000000LLU
+#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

  /* Resctrl MSRs: */
  /* - Intel: */
diff --git a/arch/x86/kvm/vmx/capabilities.h 
b/arch/x86/kvm/vmx/capabilities.h
index 85cffeae7f10..4b1ed4dc03bc 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
      return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
  }

+static inline bool cpu_has_vmx_basic_check_errcode(void)
+{
+    return    (((u64)vmcs_config.basic_cap << 32) & 
VMX_BASIC_CHECK_ERRCODE);
+}
+
  static inline bool cpu_has_virtual_nmis(void)
  {
      return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 78524daa2cb2..92aa4fc3d233 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx 
*vmx, u64 data)
  {
      const u64 feature_and_reserved =
          /* feature (except bit 48; see below) */
-        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
+        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
          /* reserved */
-        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
+        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
      u64 vmx_basic = vmcs_config.nested.basic;

      if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
@@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct 
kvm_vcpu *vcpu,
          should_have_error_code =
              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
              x86_exception_has_error_code(vector);
-        if (CC(has_error_code != should_have_error_code))
+        if (!cpu_has_vmx_basic_check_errcode() &&
+            CC(has_error_code != should_have_error_code))
              return -EINVAL;

          /* VM-entry exception error code */
@@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct 
nested_vmx_msrs *msrs)

      if (cpu_has_vmx_basic_inout())
          msrs->basic |= VMX_BASIC_INOUT;
+    if (cpu_has_vmx_basic_check_errcode())
+        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
  }

  static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d70f2e94b187..95c0eab7805c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config 
*vmcs_conf,
      rdmsrl(MSR_IA32_VMX_MISC, misc_msr);

      vmcs_conf->size = vmx_msr_high & 0x1fff;
-    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
+    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;

      vmcs_conf->revision_id = vmx_msr_low;
Chao Gao June 30, 2023, 10:27 a.m. UTC | #8
On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>
>On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>> > On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>> > > On Thu, Jun 08, 2023, Weijiang Yang wrote:
>> > > > On 6/6/2023 5:08 PM, Chao Gao wrote:
>> > > > > On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>> > > > > > Add handling for Control Protection (#CP) exceptions(vector 21).
>> > > > > > The new vector is introduced for Intel's Control-Flow Enforcement
>> > > > > > Technology (CET) relevant violation cases.
>> > > > > > 
>> > > > > > Although #CP belongs contributory exception class, but the actual
>> > > > > > effect is conditional on CET being exposed to guest. If CET is not
>> > > > > > available to guest, #CP falls back to non-contributory and doesn't
>> > > > > > have an error code.
>> > > > > This sounds weird. is this the hardware behavior? If yes, could you
>> > > > > point us to where this behavior is documented?
>> > > > It's not SDM documented behavior.
>> > > The #CP behavior needs to be documented.  Please pester whoever you need to in
>> > > order to make that happen.
>> > Do you mean documentation for #CP as an generic exception or the behavior in
>> > KVM as this patch shows?
>> As I pointed out two *years* ago, this entry in the SDM
>> 
>>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>      holds: (1) the interruption type is hardware exception; (2) bit 0
>>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>> 
>> needs to read something like
>> 
>>    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>      holds: (1) the interruption type is hardware exception; (2) bit 0
>>      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>      indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>> 
>>      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>          support for the 1-setting of CR4.CET.
>
>Hi, Sean,
>
>I sent above change request to Gil(added in cc), but he shared different
>opinion on this issue:
>
>
>"It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>
> However, there were earlier parts without CET that enumerated
>IA32_VMX_BASIC[56] as 0.
>
> On those parts, an attempt to inject an exception with vector 21 (#CP) with
>an error code would fail.
>
>(Injection of exception 21 with no error code would be allowed.)
>
> It may make things clearer if we document the statement above (all
>CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>
>I will see if we can update future revisions of the SDM to clarify this."
>
>
>Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>inject exception to nested VM.

And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

>
>And this patch could be removed, instead need another patch like below:
>
>diff --git a/arch/x86/include/asm/msr-index.h
>b/arch/x86/include/asm/msr-index.h
>index ad35355ee43e..6b33aacc8587 100644
>--- a/arch/x86/include/asm/msr-index.h
>+++ b/arch/x86/include/asm/msr-index.h
>@@ -1076,6 +1076,7 @@
> #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
> #define VMX_BASIC_MEM_TYPE_WB    6LLU
> #define VMX_BASIC_INOUT        0x0040000000000000LLU
>+#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>
> /* Resctrl MSRs: */
> /* - Intel: */
>diff --git a/arch/x86/kvm/vmx/capabilities.h
>b/arch/x86/kvm/vmx/capabilities.h
>index 85cffeae7f10..4b1ed4dc03bc 100644
>--- a/arch/x86/kvm/vmx/capabilities.h
>+++ b/arch/x86/kvm/vmx/capabilities.h
>@@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>     return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
> }
>
>+static inline bool cpu_has_vmx_basic_check_errcode(void)
>+{
>+    return    (((u64)vmcs_config.basic_cap << 32) &
>VMX_BASIC_CHECK_ERRCODE);
>+}
>+
> static inline bool cpu_has_virtual_nmis(void)
> {
>     return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
>diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>index 78524daa2cb2..92aa4fc3d233 100644
>--- a/arch/x86/kvm/vmx/nested.c
>+++ b/arch/x86/kvm/vmx/nested.c
>@@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx,
>u64 data)
> {
>     const u64 feature_and_reserved =
>         /* feature (except bit 48; see below) */
>-        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>+        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>         /* reserved */
>-        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>+        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>     u64 vmx_basic = vmcs_config.nested.basic;
>
>     if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>@@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>kvm_vcpu *vcpu,
>         should_have_error_code =
>             intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>             x86_exception_has_error_code(vector);
>-        if (CC(has_error_code != should_have_error_code))
>+        if (!cpu_has_vmx_basic_check_errcode() &&

We can skip computing should_have_error_code. and we should check if
IA32_VMX_BASIC[56] is set for this vCPU (i.e. in vmx->nested.msrs.basic)
rather than host/kvm capability.

>+            CC(has_error_code != should_have_error_code))
>             return -EINVAL;
>
>         /* VM-entry exception error code */
>@@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct
>nested_vmx_msrs *msrs)
>
>     if (cpu_has_vmx_basic_inout())
>         msrs->basic |= VMX_BASIC_INOUT;
>+    if (cpu_has_vmx_basic_check_errcode())
>+        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
> }
>
> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>index d70f2e94b187..95c0eab7805c 100644
>--- a/arch/x86/kvm/vmx/vmx.c
>+++ b/arch/x86/kvm/vmx/vmx.c
>@@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config
>*vmcs_conf,
>     rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>
>     vmcs_conf->size = vmx_msr_high & 0x1fff;
>-    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>+    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>
>     vmcs_conf->revision_id = vmx_msr_low;
>
>
Yang, Weijiang June 30, 2023, 12:05 p.m. UTC | #9
On 6/30/2023 6:27 PM, Chao Gao wrote:
> On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>>>> The new vector is introduced for Intel's Control-Flow Enforcement
>>>>>>>> Technology (CET) relevant violation cases.
>>>>>>>>
>>>>>>>> Although #CP belongs contributory exception class, but the actual
>>>>>>>> effect is conditional on CET being exposed to guest. If CET is not
>>>>>>>> available to guest, #CP falls back to non-contributory and doesn't
>>>>>>>> have an error code.
>>>>>>> This sounds weird. is this the hardware behavior? If yes, could you
>>>>>>> point us to where this behavior is documented?
>>>>>> It's not SDM documented behavior.
>>>>> The #CP behavior needs to be documented.  Please pester whoever you need to in
>>>>> order to make that happen.
>>>> Do you mean documentation for #CP as an generic exception or the behavior in
>>>> KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared different
>> opinion on this issue:
>>
>>
>> "It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>>
>>   However, there were earlier parts without CET that enumerated
>> IA32_VMX_BASIC[56] as 0.
>>
>>   On those parts, an attempt to inject an exception with vector 21 (#CP) with
>> an error code would fail.
>>
>> (Injection of exception 21 with no error code would be allowed.)
>>
>>   It may make things clearer if we document the statement above (all
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
>>
>>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>> inject exception to nested VM.
> And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

Yes, this scratch patch didn't cover cross-check with CET enabling, thanks!

>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>>
>>   /* Resctrl MSRs: */
>>   /* - Intel: */
>> diff --git a/arch/x86/kvm/vmx/capabilities.h
>> b/arch/x86/kvm/vmx/capabilities.h
>> index 85cffeae7f10..4b1ed4dc03bc 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>>       return    (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT);
>>   }
>>
>> +static inline bool cpu_has_vmx_basic_check_errcode(void)
>> +{
>> +    return    (((u64)vmcs_config.basic_cap << 32) &
>> VMX_BASIC_CHECK_ERRCODE);
>> +}
>> +
>>   static inline bool cpu_has_virtual_nmis(void)
>>   {
>>       return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 78524daa2cb2..92aa4fc3d233 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx,
>> u64 data)
>>   {
>>       const u64 feature_and_reserved =
>>           /* feature (except bit 48; see below) */
>> -        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>> +        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>>           /* reserved */
>> -        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>> +        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>>       u64 vmx_basic = vmcs_config.nested.basic;
>>
>>       if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
>> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
> We can skip computing should_have_error_code. and we should check if
> IA32_VMX_BASIC[56] is set for this vCPU (i.e. in vmx->nested.msrs.basic)
> rather than host/kvm capability.

Oops, I confused myself, yes, need to reshape the code a bit and use 
msrs.basic

to check the bit status, thanks!

>
>> +            CC(has_error_code != should_have_error_code))
>>               return -EINVAL;
>>
>>           /* VM-entry exception error code */
>> @@ -6986,6 +6987,8 @@ static void nested_vmx_setup_basic(struct
>> nested_vmx_msrs *msrs)
>>
>>       if (cpu_has_vmx_basic_inout())
>>           msrs->basic |= VMX_BASIC_INOUT;
>> +    if (cpu_has_vmx_basic_check_errcode())
>> +        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
>>   }
>>
>>   static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index d70f2e94b187..95c0eab7805c 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config
>> *vmcs_conf,
>>       rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>>
>>       vmcs_conf->size = vmx_msr_high & 0x1fff;
>> -    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> +    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>>
>>       vmcs_conf->revision_id = vmx_msr_low;
>>
>>
Neiger, Gil June 30, 2023, 3:05 p.m. UTC | #10
Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.

One can check that bit before injecting a #CP with error code, but it should not be necessary if CET is enumerated.

Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be that the virtual CPU in which KVM operates may enumerate CET but clear the bit in IA32_VMX_BASIC.

				- Gil

-----Original Message-----
From: Yang, Weijiang <weijiang.yang@intel.com> 
Sent: Friday, June 30, 2023 05:05
To: Gao, Chao <chao.gao@intel.com>
Cc: Christopherson,, Sean <seanjc@google.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification


On 6/30/2023 6:27 PM, Chao Gao wrote:
> On Fri, Jun 30, 2023 at 05:34:28PM +0800, Yang, Weijiang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>> On Fri, Jun 16, 2023, Weijiang Yang wrote:
>>>> On 6/16/2023 7:58 AM, Sean Christopherson wrote:
>>>>> On Thu, Jun 08, 2023, Weijiang Yang wrote:
>>>>>> On 6/6/2023 5:08 PM, Chao Gao wrote:
>>>>>>> On Thu, May 11, 2023 at 12:08:46AM -0400, Yang Weijiang wrote:
>>>>>>>> Add handling for Control Protection (#CP) exceptions(vector 21).
>>>>>>>> The new vector is introduced for Intel's Control-Flow 
>>>>>>>> Enforcement Technology (CET) relevant violation cases.
>>>>>>>>
>>>>>>>> Although #CP belongs contributory exception class, but the 
>>>>>>>> actual effect is conditional on CET being exposed to guest. If 
>>>>>>>> CET is not available to guest, #CP falls back to 
>>>>>>>> non-contributory and doesn't have an error code.
>>>>>>> This sounds weird. is this the hardware behavior? If yes, could 
>>>>>>> you point us to where this behavior is documented?
>>>>>> It's not SDM documented behavior.
>>>>> The #CP behavior needs to be documented.  Please pester whoever 
>>>>> you need to in order to make that happen.
>>>> Do you mean documentation for #CP as an generic exception or the 
>>>> behavior in KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP 
>>> (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared 
>> different opinion on this issue:
>>
>>
>> "It is the case that all CET-capable parts enumerate IA32_VMX_BASIC[56] as 1.
>>
>>   However, there were earlier parts without CET that enumerated 
>> IA32_VMX_BASIC[56] as 0.
>>
>>   On those parts, an attempt to inject an exception with vector 21 
>> (#CP) with an error code would fail.
>>
>> (Injection of exception 21 with no error code would be allowed.)
>>
>>   It may make things clearer if we document the statement above (all 
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
>>
>>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] 
>> before inject exception to nested VM.
> And KVM can hide CET from guests if IA32_VMX_BASIC[56] is 0.

Yes, this scratch patch didn't cover cross-check with CET enabling, thanks!

>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
>>
>>   /* Resctrl MSRs: */
>>   /* - Intel: */
>> diff --git a/arch/x86/kvm/vmx/capabilities.h 
>> b/arch/x86/kvm/vmx/capabilities.h index 85cffeae7f10..4b1ed4dc03bc 
>> 100644
>> --- a/arch/x86/kvm/vmx/capabilities.h
>> +++ b/arch/x86/kvm/vmx/capabilities.h
>> @@ -79,6 +79,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
>>       return    (((u64)vmcs_config.basic_cap << 32) & 
>> VMX_BASIC_INOUT);
>>   }
>>
>> +static inline bool cpu_has_vmx_basic_check_errcode(void)
>> +{
>> +    return    (((u64)vmcs_config.basic_cap << 32) &
>> VMX_BASIC_CHECK_ERRCODE);
>> +}
>> +
>>   static inline bool cpu_has_virtual_nmis(void)
>>   {
>>       return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS 
>> && diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c 
>> index 78524daa2cb2..92aa4fc3d233 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1227,9 +1227,9 @@ static int vmx_restore_vmx_basic(struct 
>> vcpu_vmx *vmx,
>> u64 data)
>>   {
>>       const u64 feature_and_reserved =
>>           /* feature (except bit 48; see below) */
>> -        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
>> +        BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) | BIT_ULL(56) |
>>           /* reserved */
>> -        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
>> +        BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 57);
>>       u64 vmx_basic = vmcs_config.nested.basic;
>>
>>       if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved)) 
>> @@ -2873,7 +2873,8 @@ static int 
>> nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
> We can skip computing should_have_error_code. and we should check if 
> IA32_VMX_BASIC[56] is set for this vCPU (i.e. in 
> vmx->nested.msrs.basic) rather than host/kvm capability.

Oops, I confused myself, yes, need to reshape the code a bit and use msrs.basic

to check the bit status, thanks!

>
>> +            CC(has_error_code != should_have_error_code))
>>               return -EINVAL;
>>
>>           /* VM-entry exception error code */ @@ -6986,6 +6987,8 @@ 
>> static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
>>
>>       if (cpu_has_vmx_basic_inout())
>>           msrs->basic |= VMX_BASIC_INOUT;
>> +    if (cpu_has_vmx_basic_check_errcode())
>> +        msrs->basic |= VMX_BASIC_CHECK_ERRCODE;
>>   }
>>
>>   static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs) 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 
>> d70f2e94b187..95c0eab7805c 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2748,7 +2748,7 @@ static int setup_vmcs_config(struct vmcs_config 
>> *vmcs_conf,
>>       rdmsrl(MSR_IA32_VMX_MISC, misc_msr);
>>
>>       vmcs_conf->size = vmx_msr_high & 0x1fff;
>> -    vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff;
>> +    vmcs_conf->basic_cap = vmx_msr_high & ~0x7fff;
>>
>>       vmcs_conf->revision_id = vmx_msr_low;
>>
>>
Sean Christopherson June 30, 2023, 3:07 p.m. UTC | #11
On Fri, Jun 30, 2023, Weijiang Yang wrote:
> 
> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> > > Do you mean documentation for #CP as an generic exception or the behavior in
> > > KVM as this patch shows?
> > As I pointed out two *years* ago, this entry in the SDM
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
> > 
> > needs to read something like
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
> > 
> >      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
> >          support for the 1-setting of CR4.CET.
> 
> Hi, Sean,
> 
> I sent above change request to Gil(added in cc), but he shared different
> opinion on this issue:

Heh, "opinion".

>  It may make things clearer if we document the statement above (all
> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
> 
> I will see if we can update future revisions of the SDM to clarify this."

That would be helpful.  Though to be perfectly honest, I simply overlooked the
existence of IA32_VMX_BASIC[56].

Thanks!

> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
> inject exception to nested VM.
> 
> And this patch could be removed, instead need another patch like below:
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..6b33aacc8587 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1076,6 +1076,7 @@
>  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>  #define VMX_BASIC_MEM_TYPE_WB    6LLU
>  #define VMX_BASIC_INOUT        0x0040000000000000LLU
> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

"Check Error Code" isn't a great description.  The flag enumerates that there the
CPU does *not* perform consistency checks on the error code when injecting hardware
exceptions.

So something like this?

  VMX_BASIC_NO_HW_ERROR_CODE_CC

or maybe

  VMX_BASIC_PM_NO_HW_ERROR_CODE_CC

if we want to capture that only protected mode is exempt (I personally prefer
just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
> kvm_vcpu *vcpu,
>          should_have_error_code =
>              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>              x86_exception_has_error_code(vector);
> -        if (CC(has_error_code != should_have_error_code))
> +        if (!cpu_has_vmx_basic_check_errcode() &&
> +            CC(has_error_code != should_have_error_code))

This is wrong on mutiple fronts:

  1. The new feature flag only excempts hardware exceptions delivered to guests
     with CR0.PE=1.  The above will skip the consistency check for all event injection.

  2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
     of the host CPU.

Highlighting the key phrases in the SDM:

  The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
                                                      ^^^^^^^
  hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
  (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
  exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
  
  The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
                                             ^^^^^^
  exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
  0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.

I think what we want is:

		/* VM-entry interruption-info field: deliver error code */
		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
			should_have_error_code =
				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
				x86_exception_has_error_code(vector);
			if (CC(has_error_code != should_have_error_code))
				return -EINVAL;
		}
Sean Christopherson June 30, 2023, 3:15 p.m. UTC | #12
On Fri, Jun 30, 2023, Gil Neiger wrote:
> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
> 
> One can check that bit before injecting a #CP with error code, but it should
> not be necessary if CET is enumerated.
> 
> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be
> that the virtual CPU in which KVM operates may enumerate CET but clear the
> bit in IA32_VMX_BASIC.

Yeah, I think KVM should be paranoid and expose CET to the guest if and only if
IA32_VMX_BASIC[56] is 1.  That'll also help validate nested support, e.g. will
make it more obvious if userspace+KVM provides a  "bad" model to L1.
Neiger, Gil June 30, 2023, 3:21 p.m. UTC | #13
Just in case it is not clear:  event delivery in real mode never includes an error code.  That is why the PE bit in CR0 is checked.

			- Gil

-----Original Message-----
From: Sean Christopherson <seanjc@google.com> 
Sent: Friday, June 30, 2023 08:08
To: Yang, Weijiang <weijiang.yang@intel.com>
Cc: Gao, Chao <chao.gao@intel.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification

On Fri, Jun 30, 2023, Weijiang Yang wrote:
> 
> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
> > > Do you mean documentation for #CP as an generic exception or the 
> > > behavior in KVM as this patch shows?
> > As I pointed out two *years* ago, this entry in the SDM
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
> > 
> > needs to read something like
> > 
> >    — The field's deliver-error-code bit (bit 11) is 1 if each of the following
> >      holds: (1) the interruption type is hardware exception; (2) bit 0
> >      (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
> >      (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
> >      indicates one of the following exceptions: #DF (vector 8), #TS (10),
> >      #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP 
> > (21)[1]
> > 
> >      [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
> >          support for the 1-setting of CR4.CET.
> 
> Hi, Sean,
> 
> I sent above change request to Gil(added in cc), but he shared 
> different opinion on this issue:

Heh, "opinion".

>  It may make things clearer if we document the statement above (all 
> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
> 
> I will see if we can update future revisions of the SDM to clarify this."

That would be helpful.  Though to be perfectly honest, I simply overlooked the existence of IA32_VMX_BASIC[56].

Thanks!

> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] 
> before inject exception to nested VM.
> 
> And this patch could be removed, instead need another patch like below:
> 
> diff --git a/arch/x86/include/asm/msr-index.h
> b/arch/x86/include/asm/msr-index.h
> index ad35355ee43e..6b33aacc8587 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1076,6 +1076,7 @@
>  #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>  #define VMX_BASIC_MEM_TYPE_WB    6LLU
>  #define VMX_BASIC_INOUT        0x0040000000000000LLU
> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU

"Check Error Code" isn't a great description.  The flag enumerates that there the CPU does *not* perform consistency checks on the error code when injecting hardware exceptions.

So something like this?

  VMX_BASIC_NO_HW_ERROR_CODE_CC

or maybe

  VMX_BASIC_PM_NO_HW_ERROR_CODE_CC

if we want to capture that only protected mode is exempt (I personally prefer just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
> kvm_vcpu *vcpu,
>          should_have_error_code =
>              intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>              x86_exception_has_error_code(vector);
> -        if (CC(has_error_code != should_have_error_code))
> +        if (!cpu_has_vmx_basic_check_errcode() &&
> +            CC(has_error_code != should_have_error_code))

This is wrong on mutiple fronts:

  1. The new feature flag only excempts hardware exceptions delivered to guests
     with CR0.PE=1.  The above will skip the consistency check for all event injection.

  2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
     of the host CPU.

Highlighting the key phrases in the SDM:

  The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
                                                      ^^^^^^^
  hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
  (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
  exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
  
  The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
                                             ^^^^^^
  exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
  0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.

I think what we want is:

		/* VM-entry interruption-info field: deliver error code */
		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
			should_have_error_code =
				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
				x86_exception_has_error_code(vector);
			if (CC(has_error_code != should_have_error_code))
				return -EINVAL;
		}
Yang, Weijiang July 1, 2023, 1:54 a.m. UTC | #14
On 6/30/2023 11:05 PM, Neiger, Gil wrote:
> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
>
> One can check that bit before injecting a #CP with error code, but it should not be necessary if CET is enumerated.
>
> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be that the virtual CPU in which KVM operates may enumerate CET but clear the bit in IA32_VMX_BASIC.
>
> 				- Gil
Thanks Gil for clarity!
>
> -----Original Message-----
> From: Yang, Weijiang <weijiang.yang@intel.com>
> Sent: Friday, June 30, 2023 05:05
> To: Gao, Chao <chao.gao@intel.com>
> Cc: Christopherson,, Sean <seanjc@google.com>; pbonzini@redhat.com; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; peterz@infradead.org; rppt@kernel.org; binbin.wu@linux.intel.com; Edgecombe, Rick P <rick.p.edgecombe@intel.com>; john.allen@amd.com; Neiger, Gil <gil.neiger@intel.com>
> Subject: Re: [PATCH v3 10/21] KVM:x86: Add #CP support in guest exception classification
>
>
> On 6/30/2023 6:27 PM, Chao Gao wrote:
> [...]
Yang, Weijiang July 1, 2023, 1:57 a.m. UTC | #15
On 6/30/2023 11:07 PM, Sean Christopherson wrote:
> On Fri, Jun 30, 2023, Weijiang Yang wrote:
>> On 6/17/2023 2:57 AM, Sean Christopherson wrote:
>>>> Do you mean documentation for #CP as an generic exception or the behavior in
>>>> KVM as this patch shows?
>>> As I pointed out two *years* ago, this entry in the SDM
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>>>
>>> needs to read something like
>>>
>>>     — The field's deliver-error-code bit (bit 11) is 1 if each of the following
>>>       holds: (1) the interruption type is hardware exception; (2) bit 0
>>>       (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>>>       (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector
>>>       indicates one of the following exceptions: #DF (vector 8), #TS (10),
>>>       #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), or #CP (21)[1]
>>>
>>>       [1] #CP has an error code if and only if IA32_VMX_CR4_FIXED1 enumerates
>>>           support for the 1-setting of CR4.CET.
>> Hi, Sean,
>>
>> I sent above change request to Gil(added in cc), but he shared different
>> opinion on this issue:
> Heh, "opinion".
>
>>   It may make things clearer if we document the statement above (all
>> CET-capable parts enumerate IA32_VMX_BASIC[56] as 1).
>>
>> I will see if we can update future revisions of the SDM to clarify this."
> That would be helpful.  Though to be perfectly honest, I simply overlooked the
> existence of IA32_VMX_BASIC[56].
>
> Thanks!
>
>> Then if this is the case,  kvm needs to check IA32_VMX_BASIC[56] before
>> inject exception to nested VM.
>>
>> And this patch could be removed, instead need another patch like below:
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index ad35355ee43e..6b33aacc8587 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1076,6 +1076,7 @@
>>   #define VMX_BASIC_MEM_TYPE_MASK    0x003c000000000000LLU
>>   #define VMX_BASIC_MEM_TYPE_WB    6LLU
>>   #define VMX_BASIC_INOUT        0x0040000000000000LLU
>> +#define VMX_BASIC_CHECK_ERRCODE    0x0140000000000000LLU
> "Check Error Code" isn't a great description.  The flag enumerates that there the
> CPU does *not* perform consistency checks on the error code when injecting hardware
> exceptions.
>
> So something like this?
>
>    VMX_BASIC_NO_HW_ERROR_CODE_CC
>
> or maybe
>
>    VMX_BASIC_PM_NO_HW_ERROR_CODE_CC
>
> if we want to capture that only protected mode is exempt (I personally prefer
> just VMX_BASIC_NO_HW_ERROR_CODE_CC as "PM" is a bit ambiguous).

I like VMX_BASIC_NO_HW_ERROR_CODE_CC too :-), thanks!

>
>> @@ -2873,7 +2873,8 @@ static int nested_check_vm_entry_controls(struct
>> kvm_vcpu *vcpu,
>>           should_have_error_code =
>>               intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
>>               x86_exception_has_error_code(vector);
>> -        if (CC(has_error_code != should_have_error_code))
>> +        if (!cpu_has_vmx_basic_check_errcode() &&
>> +            CC(has_error_code != should_have_error_code))
> This is wrong on mutiple fronts:
>
>    1. The new feature flag only excempts hardware exceptions delivered to guests
>       with CR0.PE=1.  The above will skip the consistency check for all event injection.
>
>    2. KVM needs to check the CPU model that is exposed to L1, not the capabilities
>       of the host CPU.
>
> Highlighting the key phrases in the SDM:
>
>    The field's deliver-error-code bit (bit 11) is 1 if each of the following holds: (1) the interruption type is
>                                                        ^^^^^^^
>    hardware exception; (2) bit 0 (corresponding to CR0.PE) is set in the CR0 field in the guest-state area;
>    (3) IA32_VMX_BASIC[56] is read as 0 (see Appendix A.1); and (4) the vector indicates one of the following
>    exceptions: #DF (vector 8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), or #AC (17).
>    
>    The field's deliver-error-code bit is 0 if any of the following holds: (1) the interruption type is not hardware
>                                               ^^^^^^
>    exception; (2) bit 0 is clear in the CR0 field in the guest-state area; or (3) IA32_VMX_BASIC[56] is read as
>    0 and the vector is in one of the following ranges: 0–7, 9, 15, 16, or 18–31.
>
> I think what we want is:
>
> 		/* VM-entry interruption-info field: deliver error code */
> 		if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION ||
> 		    !nested_cpu_has_no_hw_error_code_cc(vcpu)) {
> 			should_have_error_code =
> 				intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
> 				x86_exception_has_error_code(vector);
> 			if (CC(has_error_code != should_have_error_code))
> 				return -EINVAL;
> 		}

It looks good to me, will take it, thanks a lot!
Yang, Weijiang July 1, 2023, 1:58 a.m. UTC | #16
On 6/30/2023 11:15 PM, Sean Christopherson wrote:
> On Fri, Jun 30, 2023, Gil Neiger wrote:
>> Intel will not produce any CPU with CET that does not enumerate IA32_VMX_BASIC[56] as 1.
>>
>> One can check that bit before injecting a #CP with error code, but it should
>> not be necessary if CET is enumerated.
>>
>> Of course, if the KVM may run as a guest of another VMM/hypervisor, it may be
>> that the virtual CPU in which KVM operates may enumerate CET but clear the
>> bit in IA32_VMX_BASIC.
> Yeah, I think KVM should be paranoid and expose CET to the guest if and only if
> IA32_VMX_BASIC[56] is 1.  That'll also help validate nested support, e.g. will
> make it more obvious if userspace+KVM provides a  "bad" model to L1.

OK, will do it, thanks you two!
diff mbox series

Patch

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 7f467fe05d42..1c002abe2be8 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -33,6 +33,7 @@ 
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 96ede74a6067..7bc62cd72748 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2850,7 +2850,7 @@  static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
 		/* VM-entry interruption-info field: deliver error code */
 		should_have_error_code =
 			intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
-			x86_exception_has_error_code(vector);
+			x86_exception_has_error_code(vcpu, vector);
 		if (CC(has_error_code != should_have_error_code))
 			return -EINVAL;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7788646bbf1f..a768cbf3fbb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -520,11 +520,15 @@  EXPORT_SYMBOL_GPL(kvm_spurious_fault);
 #define EXCPT_CONTRIBUTORY	1
 #define EXCPT_PF		2
 
-static int exception_class(int vector)
+static int exception_class(struct kvm_vcpu *vcpu, int vector)
 {
 	switch (vector) {
 	case PF_VECTOR:
 		return EXCPT_PF;
+	case CP_VECTOR:
+		if (vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET)
+			return EXCPT_BENIGN;
+		return EXCPT_CONTRIBUTORY;
 	case DE_VECTOR:
 	case TS_VECTOR:
 	case NP_VECTOR:
@@ -707,8 +711,8 @@  static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
 		kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
 		return;
 	}
-	class1 = exception_class(prev_nr);
-	class2 = exception_class(nr);
+	class1 = exception_class(vcpu, prev_nr);
+	class2 = exception_class(vcpu, nr);
 	if ((class1 == EXCPT_CONTRIBUTORY && class2 == EXCPT_CONTRIBUTORY) ||
 	    (class1 == EXCPT_PF && class2 != EXCPT_BENIGN)) {
 		/*
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c544602d07a3..2ba7c7fc4846 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -171,13 +171,20 @@  static inline bool is_64_bit_hypercall(struct kvm_vcpu *vcpu)
 	return vcpu->arch.guest_state_protected || is_64_bit_mode(vcpu);
 }
 
-static inline bool x86_exception_has_error_code(unsigned int vector)
+static inline bool x86_exception_has_error_code(struct kvm_vcpu *vcpu,
+						unsigned int vector)
 {
 	static u32 exception_has_error_code = BIT(DF_VECTOR) | BIT(TS_VECTOR) |
 			BIT(NP_VECTOR) | BIT(SS_VECTOR) | BIT(GP_VECTOR) |
-			BIT(PF_VECTOR) | BIT(AC_VECTOR);
+			BIT(PF_VECTOR) | BIT(AC_VECTOR) | BIT(CP_VECTOR);
 
-	return (1U << vector) & exception_has_error_code;
+	if (!((1U << vector) & exception_has_error_code))
+		return false;
+
+	if (vector == CP_VECTOR)
+		return !(vcpu->arch.cr4_guest_rsvd_bits & X86_CR4_CET);
+
+	return true;
 }
 
 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)