Message ID | 20200928072043.9359-4-krish.sadhukhan@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: nSVM: Add checks for CR3 and CR4 reserved bits to svm_set_nested_state() and test CR3 non-MBZ reserved bits | expand |
On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote: > According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3 > need to be set by software as follows: > > "Reserved Bits. Reserved fields should be cleared to 0 by software > when writing CR3." Nothing in the shortlog or changelog actually states what this patch does. "Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and IIUC, the changelog is straight up misleading. Based on the discussion from v1, I _think_ this test verifies that KVM does _not_ fail nested VMRUN if non-MBZ bits are set, correct? If so, then something like: KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode with further explanation in the changelog would be very helpful.
On 9/28/20 8:11 PM, Sean Christopherson wrote: > On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote: >> According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3 >> need to be set by software as follows: >> >> "Reserved Bits. Reserved fields should be cleared to 0 by software >> when writing CR3." > Nothing in the shortlog or changelog actually states what this patch does. > "Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and > IIUC, the changelog is straight up misleading. > > Based on the discussion from v1, I _think_ this test verifies that KVM does > _not_ fail nested VMRUN if non-MBZ bits are set, correct? Not KVM, hardware rather. Hardware doesn't consider it as an invalid guest state if non-MBZ reserved bits are set. > > If so, then something like: > > KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode > > with further explanation in the changelog would be very helpful. Even though the non-MBZ reserved bits are ignored by the consistency checks in hardware, eventually page-table walks fail. So, I am wondering whether it is appropriate to say, "Verify non-MBZ CR3 reserved bits can be set in long mode" because the test is inducing an artificial failure even before any guest instruction is executed. We are not entering the guest with these bits set. I prefer to keep the commit header as is and rather expand the commit message to explain what I have described here. How about that ?
On Wed, Sep 30, 2020 at 05:29:24PM -0700, Krish Sadhukhan wrote: > > On 9/28/20 8:11 PM, Sean Christopherson wrote: > >On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote: > >>According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3 > >>need to be set by software as follows: > >> > >> "Reserved Bits. Reserved fields should be cleared to 0 by software > >> when writing CR3." > >Nothing in the shortlog or changelog actually states what this patch does. > >"Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and > >IIUC, the changelog is straight up misleading. > > > >Based on the discussion from v1, I _think_ this test verifies that KVM does > >_not_ fail nested VMRUN if non-MBZ bits are set, correct? > > Not KVM, hardware rather. Hardware doesn't consider it as an invalid guest > state if non-MBZ reserved bits are set. > > > >If so, then something like: > > > > KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode > > > >with further explanation in the changelog would be very helpful. > > Even though the non-MBZ reserved bits are ignored by the consistency checks > in hardware, eventually page-table walks fail. So, I am wondering whether it Page table walks fail how? Are you referring to forcing the #NPF, or does the CPU puke on the non-MBZ reserved bits at some point? > is appropriate to say, > > "Verify non-MBZ CR3 reserved bits can be set in long mode" > > because the test is inducing an artificial failure even before any guest > instruction is executed. We are not entering the guest with these bits set. Yes we are, unless I'm misunderstanding how SVM handles VMRUN. "entering" the guest does not mean successfully executing guest code, it means loading guest state and completing the world switch. I don't think I'm misunderstanding, because the test explicitly clears the NPT PML4[0]'s present bit to induce a #NPF. That means the CPU is fetching instructions, and again unless there's details about NPT that I'm missing, the fact that the test sees a #NPF means that the CPU successfully completed the GVA->GPA translation using the "bad" CR3. > I prefer to keep the commit header as is and rather expand the commit > message to explain what I have described here. How about that ? That's fine, so long as it documents both what the test is actually verifying and what is/isn't legal.
On 9/30/20 5:50 PM, Sean Christopherson wrote: > On Wed, Sep 30, 2020 at 05:29:24PM -0700, Krish Sadhukhan wrote: >> On 9/28/20 8:11 PM, Sean Christopherson wrote: >>> On Mon, Sep 28, 2020 at 07:20:42AM +0000, Krish Sadhukhan wrote: >>>> According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3 >>>> need to be set by software as follows: >>>> >>>> "Reserved Bits. Reserved fields should be cleared to 0 by software >>>> when writing CR3." >>> Nothing in the shortlog or changelog actually states what this patch does. >>> "Test non-MBZ reserved bits in CR3 in long mode" is rather ambiguous, and >>> IIUC, the changelog is straight up misleading. >>> >>> Based on the discussion from v1, I _think_ this test verifies that KVM does >>> _not_ fail nested VMRUN if non-MBZ bits are set, correct? >> Not KVM, hardware rather. Hardware doesn't consider it as an invalid guest >> state if non-MBZ reserved bits are set. >>> If so, then something like: >>> >>> KVM: nSVM: Verify non-MBZ CR3 reserved bits can be set in long mode >>> >>> with further explanation in the changelog would be very helpful. >> Even though the non-MBZ reserved bits are ignored by the consistency checks >> in hardware, eventually page-table walks fail. So, I am wondering whether it > Page table walks fail how? Are you referring to forcing the #NPF, or does > the CPU puke on the non-MBZ reserved bits at some point? I notice the following in my experiments when I don't clear the P bit in NPT PML4[0] to induce an #NPF: 1. In long mode and in legacy-PAE mode, guest VMMCALL is successfully executed and kvm_x86_ops.handle_exit() receives VMEXIT_VMMCALL. 2. In legacy-non-PAE mode, kvm_x86_ops.handle_exit(), receives VMEXIT_NPF. > >> is appropriate to say, >> >> "Verify non-MBZ CR3 reserved bits can be set in long mode" >> >> because the test is inducing an artificial failure even before any guest >> instruction is executed. We are not entering the guest with these bits set. > Yes we are, unless I'm misunderstanding how SVM handles VMRUN. "entering" the > guest does not mean successfully executing guest code, it means loading guest > state and completing the world switch. I don't think I'm misunderstanding, > because the test explicitly clears the NPT PML4[0]'s present bit to induce a > #NPF. That means the CPU is fetching instructions, and again unless there's > details about NPT that I'm missing, the fact that the test sees a #NPF means > that the CPU successfully completed the GVA->GPA translation using the "bad" > CR3. You are right. According to APM, the hardware loads the guest state first and then does the consistency checking. So, yes, world switch happens before consistency checking begins. > >> I prefer to keep the commit header as is and rather expand the commit >> message to explain what I have described here. How about that ? > That's fine, so long as it documents both what the test is actually verifying > and what is/isn't legal.
diff --git a/x86/svm.h b/x86/svm.h index 15e0f18..465d794 100644 --- a/x86/svm.h +++ b/x86/svm.h @@ -325,7 +325,8 @@ struct __attribute__ ((__packed__)) vmcb { #define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP) #define SVM_CR0_RESERVED_MASK 0xffffffff00000000U -#define SVM_CR3_LONG_RESERVED_MASK 0xfff0000000000000U +#define SVM_CR3_LONG_MBZ_MASK 0xfff0000000000000U +#define SVM_CR3_LONG_RESERVED_MASK 0x0000000000000fe7U #define SVM_CR4_LEGACY_RESERVED_MASK 0xff88f000U #define SVM_CR4_RESERVED_MASK 0xffffffffff88f000U #define SVM_DR6_RESERVED_MASK 0xffffffffffff1ff0U diff --git a/x86/svm_tests.c b/x86/svm_tests.c index 1908c7c..6c97ee3 100644 --- a/x86/svm_tests.c +++ b/x86/svm_tests.c @@ -1913,7 +1913,8 @@ static void basic_guest_main(struct svm_test *test) } \ } -#define SVM_TEST_CR_RESERVED_BITS(start, end, inc, cr, val, resv_mask) \ +#define SVM_TEST_CR_RESERVED_BITS(start, end, inc, cr, val, resv_mask, \ + exit_code) \ { \ u64 tmp, mask; \ int i; \ @@ -1933,7 +1934,7 @@ static void basic_guest_main(struct svm_test *test) case 4: \ vmcb->save.cr4 = tmp; \ } \ - report(svm_vmrun() == SVM_EXIT_ERR, "Test CR%d %d:%d: %lx",\ + report(svm_vmrun() == exit_code, "Test CR%d %d:%d: %lx",\ cr, end, start, tmp); \ } \ } @@ -2012,9 +2013,48 @@ static void test_cr3(void) u64 cr3_saved = vmcb->save.cr3; SVM_TEST_CR_RESERVED_BITS(0, 63, 1, 3, cr3_saved, - SVM_CR3_LONG_RESERVED_MASK); + SVM_CR3_LONG_MBZ_MASK, SVM_EXIT_ERR); + + vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_MBZ_MASK; + report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx", + vmcb->save.cr3); + + /* + * CR3 non-MBZ reserved bits based on different modes: + * [11:5] [2:0] - long mode + */ + u64 cr4_saved = vmcb->save.cr4; + + /* + * Long mode + */ + if (this_cpu_has(X86_FEATURE_PCID)) { + vmcb->save.cr4 = cr4_saved | X86_CR4_PCIDE; + SVM_TEST_CR_RESERVED_BITS(0, 11, 1, 3, cr3_saved, + SVM_CR3_LONG_RESERVED_MASK, SVM_EXIT_VMMCALL); + + vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_RESERVED_MASK; + report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx", + vmcb->save.cr3); + } else { + u64 *pdpe = npt_get_pml4e(); + + vmcb->save.cr4 = cr4_saved & ~X86_CR4_PCIDE; + + /* Clear P (Present) bit in NPT in order to trigger #NPF */ + pdpe[0] &= ~1ULL; + + SVM_TEST_CR_RESERVED_BITS(0, 11, 1, 3, cr3_saved, + SVM_CR3_LONG_RESERVED_MASK, SVM_EXIT_NPF); + + pdpe[0] |= 1ULL; + vmcb->save.cr3 = cr3_saved & ~SVM_CR3_LONG_RESERVED_MASK; + report(svm_vmrun() == SVM_EXIT_VMMCALL, "Test CR3 63:0: %lx", + vmcb->save.cr3); + } vmcb->save.cr3 = cr3_saved; + vmcb->save.cr4 = cr4_saved; } static void test_cr4(void) @@ -2031,14 +2071,14 @@ static void test_cr4(void) efer &= ~EFER_LME; vmcb->save.efer = efer; SVM_TEST_CR_RESERVED_BITS(12, 31, 1, 4, cr4_saved, - SVM_CR4_LEGACY_RESERVED_MASK); + SVM_CR4_LEGACY_RESERVED_MASK, SVM_EXIT_ERR); efer |= EFER_LME; vmcb->save.efer = efer; SVM_TEST_CR_RESERVED_BITS(12, 31, 1, 4, cr4_saved, - SVM_CR4_RESERVED_MASK); + SVM_CR4_RESERVED_MASK, SVM_EXIT_ERR); SVM_TEST_CR_RESERVED_BITS(32, 63, 4, 4, cr4_saved, - SVM_CR4_RESERVED_MASK); + SVM_CR4_RESERVED_MASK, SVM_EXIT_ERR); vmcb->save.cr4 = cr4_saved; vmcb->save.efer = efer_saved;
According to section "CR3" in APM vol. 2, the non-MBZ reserved bits in CR3 need to be set by software as follows: "Reserved Bits. Reserved fields should be cleared to 0 by software when writing CR3." Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> --- x86/svm.h | 3 ++- x86/svm_tests.c | 52 +++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 48 insertions(+), 7 deletions(-)