diff mbox series

[3/4,v2] KVM: nSVM: Test non-MBZ reserved bits in CR3 in long mode

Message ID 20200928072043.9359-4-krish.sadhukhan@oracle.com (mailing list archive)
State New, archived
Headers show
Series KVM: nSVM: Add checks for CR3 and CR4 reserved bits to svm_set_nested_state() and test CR3 non-MBZ reserved bits | expand

Commit Message

Krish Sadhukhan Sept. 28, 2020, 7:20 a.m. UTC
According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3
need to be set by software as follows:

	"Reserved Bits. Reserved fields should be cleared to 0 by software
	when writing CR3."

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
---
 x86/svm.h       |  3 ++-
 x86/svm_tests.c | 52 +++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 48 insertions(+), 7 deletions(-)

Comments

Sean Christopherson Sept. 29, 2020, 3:11 a.m. UTC | #1
On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote:
> According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3
> need to be set by software as follows:
> 
> 	"Reserved Bits. Reserved fields should be cleared to 0 by software
> 	when writing CR3."

Nothing in the shortlog or changelog actually states what this patch does.
"Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and
IIUC, the changelog is straight up misleading.

Based on the discussion from v1, I _think_ this test verifies that KVM does
_not_ fail nested VMRUN if non-MBZ bits are set, correct?

If so, then something like:

  KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode

with further explanation in the changelog would be very helpful.
Krish Sadhukhan Oct. 1, 2020, 12:29 a.m. UTC | #2
On 9/28/20 8:11 PM, Sean Christopherson wrote:
> On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote:
>> According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3
>> need to be set by software as follows:
>>
>> 	"Reserved Bits. Reserved fields should be cleared to 0 by software
>> 	when writing CR3."
> Nothing in the shortlog or changelog actually states what this patch does.
> "Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and
> IIUC, the changelog is straight up misleading.
>
> Based on the discussion from v1, I _think_ this test verifies that KVM does
> _not_ fail nested VMRUN if non-MBZ bits are set, correct?

Not KVM, hardware rather.  Hardware doesn't consider it as an invalid 
guest state if non-MBZ reserved bits are set.
>
> If so, then something like:
>
>    KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode
>
> with further explanation in the changelog would be very helpful.

Even though the non-MBZ reserved bits are ignored by the consistency 
checks in hardware, eventually page-table walks fail. So, I am wondering 
whether it is appropriate to say,

             "Verify non-MBZ CR3 reserved bits can be set in long mode"

because the test is inducing an artificial failure even before any guest 
instruction is executed. We are not entering the guest with these bits set.

I prefer to keep the commit header as is and rather expand the commit 
message to explain what I have described here. How about that ?
Sean Christopherson Oct. 1, 2020, 12:50 a.m. UTC | #3
On Wed, Sep 30, 2020 at 05:29:24PM -0700, Krish Sadhukhan wrote:
> 
> On 9/28/20 8:11 PM, Sean Christopherson wrote:
> >On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote:
> >>According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3
> >>need to be set by software as follows:
> >>
> >>	"Reserved Bits. Reserved fields should be cleared to 0 by software
> >>	when writing CR3."
> >Nothing in the shortlog or changelog actually states what this patch does.
> >"Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and
> >IIUC, the changelog is straight up misleading.
> >
> >Based on the discussion from v1, I _think_ this test verifies that KVM does
> >_not_ fail nested VMRUN if non-MBZ bits are set, correct?
> 
> Not KVM, hardware rather.  Hardware doesn't consider it as an invalid guest
> state if non-MBZ reserved bits are set.
> >
> >If so, then something like:
> >
> >   KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode
> >
> >with further explanation in the changelog would be very helpful.
> 
> Even though the non-MBZ reserved bits are ignored by the consistency checks
> in hardware, eventually page-table walks fail. So, I am wondering whether it

Page table walks fail how?  Are you referring to forcing the #NPF, or does
the CPU puke on the non-MBZ reserved bits at some point?

> is appropriate to say,
> 
>             "Verify non-MBZ CR3 reserved bits can be set in long mode"
> 
> because the test is inducing an artificial failure even before any guest
> instruction is executed. We are not entering the guest with these bits set.

Yes we are, unless I'm misunderstanding how SVM handles VMRUN.  "entering" the
guest does not mean successfully executing guest code, it means loading guest
state and completing the world switch.  I don't think I'm misunderstanding,
because the test explicitly clears the NPT PML4[0]'s present bit to induce a
#NPF.  That means the CPU is fetching instructions, and again unless there's
details about NPT that I'm missing, the fact that the test sees a #NPF means
that the CPU successfully completed the GVA->GPA translation using the "bad"
CR3.

> I prefer to keep the commit header as is and rather expand the commit
> message to explain what I have described here. How about that ?

That's fine, so long as it documents both what the test is actually verifying
and what is/isn't legal.
Krish Sadhukhan Oct. 5, 2020, 7:10 p.m. UTC | #4
On 9/30/20 5:50 PM, Sean Christopherson wrote:
> On Wed, Sep 30, 2020 at 05:29:24PM -0700, Krish Sadhukhan wrote:
>> On 9/28/20 8:11 PM, Sean Christopherson wrote:
>>> On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote:
>>>> According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3
>>>> need to be set by software as follows:
>>>>
>>>> 	"Reserved Bits. Reserved fields should be cleared to 0 by software
>>>> 	when writing CR3."
>>> Nothing in the shortlog or changelog actually states what this patch does.
>>> "Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and
>>> IIUC, the changelog is straight up misleading.
>>>
>>> Based on the discussion from v1, I _think_ this test verifies that KVM does
>>> _not_ fail nested VMRUN if non-MBZ bits are set, correct?
>> Not KVM, hardware rather.  Hardware doesn't consider it as an invalid guest
>> state if non-MBZ reserved bits are set.
>>> If so, then something like:
>>>
>>>    KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode
>>>
>>> with further explanation in the changelog would be very helpful.
>> Even though the non-MBZ reserved bits are ignored by the consistency checks
>> in hardware, eventually page-table walks fail. So, I am wondering whether it
> Page table walks fail how?  Are you referring to forcing the #NPF, or does
> the CPU puke on the non-MBZ reserved bits at some point?


I notice the following in my experiments when I don't clear the P bit in 
NPT PML4[0] to induce an
#NPF:

     1. In long mode and in legacy-PAE mode, guest VMMCALL is 
successfully executed and kvm_x86_ops.handle_exit() receives VMEXIT_VMMCALL.

     2. In legacy-non-PAE mode, kvm_x86_ops.handle_exit(), receives 
VMEXIT_NPF.

>
>> is appropriate to say,
>>
>>              "Verify non-MBZ CR3 reserved bits can be set in long mode"
>>
>> because the test is inducing an artificial failure even before any guest
>> instruction is executed. We are not entering the guest with these bits set.
> Yes we are, unless I'm misunderstanding how SVM handles VMRUN.  "entering" the
> guest does not mean successfully executing guest code, it means loading guest
> state and completing the world switch.  I don't think I'm misunderstanding,
> because the test explicitly clears the NPT PML4[0]'s present bit to induce a
> #NPF.  That means the CPU is fetching instructions, and again unless there's
> details about NPT that I'm missing, the fact that the test sees a #NPF means
> that the CPU successfully completed the GVA->GPA translation using the "bad"
> CR3.


You are right. According to APM, the hardware loads the guest state 
first and then does the consistency checking. So, yes, world switch 
happens before consistency checking begins.

>
>> I prefer to keep the commit header as is and rather expand the commit
>> message to explain what I have described here. How about that ?
> That's fine, so long as it documents both what the test is actually verifying
> and what is/isn't legal.
diff mbox series

Patch

diff --git a/x86/svm.h b/x86/svm.h
index 15e0f18..465d794 100644
--- a/x86/svm.h
+++ b/x86/svm.h
@@ -325,7 +325,8 @@  struct __attribute__ ((__packed__)) vmcb {
 #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
 
 #define	SVM_CR0_RESERVED_MASK			0xffffffff00000000U
-#define	SVM_CR3_LONG_RESERVED_MASK		0xfff0000000000000U
+#define	SVM_CR3_LONG_MBZ_MASK			0xfff0000000000000U
+#define	SVM_CR3_LONG_RESERVED_MASK		0x0000000000000fe7U
 #define	SVM_CR4_LEGACY_RESERVED_MASK		0xff88f000U
 #define	SVM_CR4_RESERVED_MASK			0xffffffffff88f000U
 #define	SVM_DR6_RESERVED_MASK			0xffffffffffff1ff0U
diff --git a/x86/svm_tests.c b/x86/svm_tests.c
index 1908c7c..6c97ee3 100644
--- a/x86/svm_tests.c
+++ b/x86/svm_tests.c
@@ -1913,7 +1913,8 @@  static void basic_guest_main(struct svm_test *test)
         }								\
 }
 
-#define SVM_TEST_CR_RESERVED_BITS(start, end, inc, cr, val, resv_mask)	\
+#define SVM_TEST_CR_RESERVED_BITS(start, end, inc, cr, val, resv_mask,	\
+				  exit_code)				\
 {									\
 	u64 tmp, mask;							\
 	int i;								\
@@ -1933,7 +1934,7 @@  static void basic_guest_main(struct svm_test *test)
 		case 4:							\
 			vmcb->save.cr4 = tmp;				\
 		}							\
-		report(svm_vmrun() == SVM_EXIT_ERR, "Test CR%d %d:%d: %lx",\
+		report(svm_vmrun() == exit_code, "Test CR%d %d:%d: %lx",\
 		    cr, end, start, tmp);				\
 	}								\
 }
@@ -2012,9 +2013,48 @@  static void test_cr3(void)
 	u64 cr3_saved = vmcb->save.cr3;
 
 	SVM_TEST_CR_RESERVED_BITS(0, 63, 1, 3, cr3_saved,
-	    SVM_CR3_LONG_RESERVED_MASK);
+	    SVM_CR3_LONG_MBZ_MASK, SVM_EXIT_ERR);
+
+	vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_MBZ_MASK;
+	report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx",
+	    vmcb->save.cr3);
+
+	/*
+	 * CR3 non-MBZ reserved bits based on different modes:
+	 *   [11:5] [2:0] - long mode
+	 */
+	u64 cr4_saved = vmcb->save.cr4;
+
+	/*
+	 * Long mode
+	 */
+	if (this_cpu_has(X86_FEATURE_PCID)) {
+		vmcb->save.cr4 = cr4_saved | X86_CR4_PCIDE;
+		SVM_TEST_CR_RESERVED_BITS(0, 11, 1, 3, cr3_saved,
+		    SVM_CR3_LONG_RESERVED_MASK, SVM_EXIT_VMMCALL);
+
+		vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_RESERVED_MASK;
+		report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx",
+		    vmcb->save.cr3);
+	} else {
+		u64 *pdpe = npt_get_pml4e();
+
+		vmcb->save.cr4 = cr4_saved & ~X86_CR4_PCIDE;
+
+		/* Clear P (Present) bit in NPT in order to trigger #NPF */
+		pdpe[0] &= ~1ULL;
+
+		SVM_TEST_CR_RESERVED_BITS(0, 11, 1, 3, cr3_saved,
+		    SVM_CR3_LONG_RESERVED_MASK, SVM_EXIT_NPF);
+
+		pdpe[0] |= 1ULL;
+		vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_RESERVED_MASK;
+		report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx",
+		    vmcb->save.cr3);
+	}
 
 	vmcb->save.cr3 = cr3_saved;
+	vmcb->save.cr4 = cr4_saved;
 }
 
 static void test_cr4(void)
@@ -2031,14 +2071,14 @@  static void test_cr4(void)
 	efer &= ~EFER_LME;
 	vmcb->save.efer = efer;
 	SVM_TEST_CR_RESERVED_BITS(12, 31, 1, 4, cr4_saved,
-	    SVM_CR4_LEGACY_RESERVED_MASK);
+	    SVM_CR4_LEGACY_RESERVED_MASK, SVM_EXIT_ERR);
 
 	efer |= EFER_LME;
 	vmcb->save.efer = efer;
 	SVM_TEST_CR_RESERVED_BITS(12, 31, 1, 4, cr4_saved,
-	    SVM_CR4_RESERVED_MASK);
+	    SVM_CR4_RESERVED_MASK, SVM_EXIT_ERR);
 	SVM_TEST_CR_RESERVED_BITS(32, 63, 4, 4, cr4_saved,
-	    SVM_CR4_RESERVED_MASK);
+	    SVM_CR4_RESERVED_MASK, SVM_EXIT_ERR);
 
 	vmcb->save.cr4 = cr4_saved;
 	vmcb->save.efer = efer_saved;