Message ID | 20190925011821.24523-2-marcorr@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [kvm-unit-tests,v7,1/2] x86: nvmx: fix bug in __enter_guest() | expand |
On Tue, Sep 24, 2019 at 06:18:21PM -0700, Marc Orr wrote: > Excercise nested VMX's atomic MSR switch code (e.g., VM-entry MSR-load > list) at the maximum number of MSRs supported, as described in the SDM, > in the appendix chapter titled "MISCELLANEOUS DATA". > > Suggested-by: Jim Mattson <jmattson@google.com> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> > Signed-off-by: Marc Orr <marcorr@google.com> > --- Thanks for being patient and fixing all the nits! Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
On 25/09/19 03:18, Marc Orr wrote: > diff --git a/x86/unittests.cfg b/x86/unittests.cfg > index 694ee3d42f3a..05122cf91ea1 100644 > --- a/x86/unittests.cfg > +++ b/x86/unittests.cfg > @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip > > [vmx] > file = vmx.flat > -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" > +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" > arch = x86_64 > groups = vmx I just noticed this, why is the test disabled by default? Paolo
On Thu, Sep 26, 2019 at 11:24:57AM +0200, Paolo Bonzini wrote: > On 25/09/19 03:18, Marc Orr wrote: > > diff --git a/x86/unittests.cfg b/x86/unittests.cfg > > index 694ee3d42f3a..05122cf91ea1 100644 > > --- a/x86/unittests.cfg > > +++ b/x86/unittests.cfg > > @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip > > > > [vmx] > > file = vmx.flat > > -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" > > +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" > > arch = x86_64 > > groups = vmx > > I just noticed this, why is the test disabled by default? The negative test triggers undefined behavior, e.g. on bare metal the test would fail because VM-Enter would succeed due to lack of an explicit check on the MSR count. Since the test relies on somehwat arbitrary KVM behavior, we made it opt-in.
> On 26 Sep 2019, at 17:32, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Thu, Sep 26, 2019 at 11:24:57AM +0200, Paolo Bonzini wrote: >> On 25/09/19 03:18, Marc Orr wrote: >>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg >>> index 694ee3d42f3a..05122cf91ea1 100644 >>> --- a/x86/unittests.cfg >>> +++ b/x86/unittests.cfg >>> @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip >>> >>> [vmx] >>> file = vmx.flat >>> -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" >>> +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" >>> arch = x86_64 >>> groups = vmx >> >> I just noticed this, why is the test disabled by default? > > The negative test triggers undefined behavior, e.g. on bare metal the > test would fail because VM-Enter would succeed due to lack of an explicit > check on the MSR count. > > Since the test relies on somehwat arbitrary KVM behavior, we made it opt-in. Just note that when commit 5ac120c23753 ("x86: vmx: Test INIT processing during various CPU VMX states”) was merged to master, it was changed to accidentally remove “-atomic_switch_overflow_msrs_test”. (Probably a merge mistake). So this should be fixed by a new patch :P -Liran
On 26/09/19 16:35, Liran Alon wrote: > > >> On 26 Sep 2019, at 17:32, Sean Christopherson <sean.j.christopherson@intel.com> wrote: >> >> On Thu, Sep 26, 2019 at 11:24:57AM +0200, Paolo Bonzini wrote: >>> On 25/09/19 03:18, Marc Orr wrote: >>>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg >>>> index 694ee3d42f3a..05122cf91ea1 100644 >>>> --- a/x86/unittests.cfg >>>> +++ b/x86/unittests.cfg >>>> @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip >>>> >>>> [vmx] >>>> file = vmx.flat >>>> -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" >>>> +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" >>>> arch = x86_64 >>>> groups = vmx >>> >>> I just noticed this, why is the test disabled by default? >> >> The negative test triggers undefined behavior, e.g. on bare metal the >> test would fail because VM-Enter would succeed due to lack of an explicit >> check on the MSR count. >> >> Since the test relies on somehwat arbitrary KVM behavior, we made it opt-in. > > Just note that when commit 5ac120c23753 ("x86: vmx: Test INIT processing during various CPU VMX states”) > was merged to master, it was changed to accidentally remove “-atomic_switch_overflow_msrs_test”. > (Probably a merge mistake). > > So this should be fixed by a new patch :P I will just rewrite history and fix the mistake. Paolo
> On Sep 26, 2019, at 7:32 AM, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Thu, Sep 26, 2019 at 11:24:57AM +0200, Paolo Bonzini wrote: >> On 25/09/19 03:18, Marc Orr wrote: >>> diff --git a/x86/unittests.cfg b/x86/unittests.cfg >>> index 694ee3d42f3a..05122cf91ea1 100644 >>> --- a/x86/unittests.cfg >>> +++ b/x86/unittests.cfg >>> @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip >>> >>> [vmx] >>> file = vmx.flat >>> -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" >>> +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" >>> arch = x86_64 >>> groups = vmx >> >> I just noticed this, why is the test disabled by default? > > The negative test triggers undefined behavior, e.g. on bare metal the > test would fail because VM-Enter would succeed due to lack of an explicit > check on the MSR count. > > Since the test relies on somehwat arbitrary KVM behavior, we made it opt-in. Thanks for caring, but it would be better to explicitly skip the test if it is not running on bare-metal. For instance, I missed this thread and needed to check why the test fails on bare-metal... Besides, it seems that v6 was used and not v7, so the error messages are strange: Test suite: atomic_switch_overflow_msrs_test FAIL: exit_reason, 18, is 2147483682. FAIL: exit_qual, 0, is 513. SUMMARY: 11 tests, 2 unexpected failures I also think that printing the exit-reason in hex format would be more readable.
> Thanks for caring, but it would be better to explicitly skip the test if it > is not running on bare-metal. For instance, I missed this thread and needed > to check why the test fails on bare-metal... > > Besides, it seems that v6 was used and not v7, so the error messages are > strange: > > Test suite: atomic_switch_overflow_msrs_test > FAIL: exit_reason, 18, is 2147483682. > FAIL: exit_qual, 0, is 513. > SUMMARY: 11 tests, 2 unexpected failures > > I also think that printing the exit-reason in hex format would be more > readable. Exit reasons are enumerated in decimal rather than hex in the SDM (volume 3, appendix C). To be clear, are you saying you "opted in" to the test on bare metal, and got confused when it failed? Or, are you saying that our patch on unittest.cfg to make the test not run by default didn't work?
> On Sep 30, 2019, at 4:34 PM, Marc Orr <marcorr@google.com> wrote: > >> Thanks for caring, but it would be better to explicitly skip the test if it >> is not running on bare-metal. For instance, I missed this thread and needed >> to check why the test fails on bare-metal... >> >> Besides, it seems that v6 was used and not v7, so the error messages are >> strange: >> >> Test suite: atomic_switch_overflow_msrs_test >> FAIL: exit_reason, 18, is 2147483682. >> FAIL: exit_qual, 0, is 513. >> SUMMARY: 11 tests, 2 unexpected failures >> >> I also think that printing the exit-reason in hex format would be more >> readable. > > Exit reasons are enumerated in decimal rather than hex in the SDM > (volume 3, appendix C). I know, but when the failed VM entry indication is on, it is just a huge mess. Never mind, this is a minor issue. > To be clear, are you saying you "opted in" to the test on bare metal, > and got confused when it failed? Or, are you saying that our patch on > unittest.cfg to make the test not run by default didn't work? I ran it on bare-metal and needed to spend some time to realize that it is expected to fail on bare-metal “by design”.
> >> Thanks for caring, but it would be better to explicitly skip the test if it > >> is not running on bare-metal. For instance, I missed this thread and needed > >> to check why the test fails on bare-metal... > >> > >> Besides, it seems that v6 was used and not v7, so the error messages are > >> strange: > >> > >> Test suite: atomic_switch_overflow_msrs_test > >> FAIL: exit_reason, 18, is 2147483682. > >> FAIL: exit_qual, 0, is 513. > >> SUMMARY: 11 tests, 2 unexpected failures > >> > >> I also think that printing the exit-reason in hex format would be more > >> readable. > > > > Exit reasons are enumerated in decimal rather than hex in the SDM > > (volume 3, appendix C). > > I know, but when the failed VM entry indication is on, it is just a huge > mess. Never mind, this is a minor issue. > > > To be clear, are you saying you "opted in" to the test on bare metal, > > and got confused when it failed? Or, are you saying that our patch on > > unittest.cfg to make the test not run by default didn't work? > > I ran it on bare-metal and needed to spend some time to realize that it is > expected to fail on bare-metal “by design”. Ack. Maybe we should move tests like this into a *_virt_only.c counter-part? E.g., we could create a new, opt-in, file, vmx_tests_virt_only.c for this test. When similar scenarios arise in the future, this new precedent could be replicated, to make it obvious which tests are expected to fail on bare metal.
> On Sep 30, 2019, at 5:03 PM, Marc Orr <marcorr@google.com> wrote: > >>>> Thanks for caring, but it would be better to explicitly skip the test if it >>>> is not running on bare-metal. For instance, I missed this thread and needed >>>> to check why the test fails on bare-metal... >>>> >>>> Besides, it seems that v6 was used and not v7, so the error messages are >>>> strange: >>>> >>>> Test suite: atomic_switch_overflow_msrs_test >>>> FAIL: exit_reason, 18, is 2147483682. >>>> FAIL: exit_qual, 0, is 513. >>>> SUMMARY: 11 tests, 2 unexpected failures >>>> >>>> I also think that printing the exit-reason in hex format would be more >>>> readable. >>> >>> Exit reasons are enumerated in decimal rather than hex in the SDM >>> (volume 3, appendix C). >> >> I know, but when the failed VM entry indication is on, it is just a huge >> mess. Never mind, this is a minor issue. >> >>> To be clear, are you saying you "opted in" to the test on bare metal, >>> and got confused when it failed? Or, are you saying that our patch on >>> unittest.cfg to make the test not run by default didn't work? >> >> I ran it on bare-metal and needed to spend some time to realize that it is >> expected to fail on bare-metal “by design”. > > Ack. Maybe we should move tests like this into a *_virt_only.c > counter-part? E.g., we could create a new, opt-in, file, > vmx_tests_virt_only.c for this test. When similar scenarios arise in > the future, this new precedent could be replicated, to make it obvious > which tests are expected to fail on bare metal. Thanks for the willingness, but I don’t know whether any intrusive change is needed at the moment. Even just getting the print-out to have something like “(KVM-specific)” comment would be enough, assuming the test can easily be disabled (as is the case with atomic_switch_overflow_msrs_test).
> >>>> Thanks for caring, but it would be better to explicitly skip the test if it > >>>> is not running on bare-metal. For instance, I missed this thread and needed > >>>> to check why the test fails on bare-metal... > >>>> > >>>> Besides, it seems that v6 was used and not v7, so the error messages are > >>>> strange: > >>>> > >>>> Test suite: atomic_switch_overflow_msrs_test > >>>> FAIL: exit_reason, 18, is 2147483682. > >>>> FAIL: exit_qual, 0, is 513. > >>>> SUMMARY: 11 tests, 2 unexpected failures > >>>> > >>>> I also think that printing the exit-reason in hex format would be more > >>>> readable. > >>> > >>> Exit reasons are enumerated in decimal rather than hex in the SDM > >>> (volume 3, appendix C). > >> > >> I know, but when the failed VM entry indication is on, it is just a huge > >> mess. Never mind, this is a minor issue. > >> > >>> To be clear, are you saying you "opted in" to the test on bare metal, > >>> and got confused when it failed? Or, are you saying that our patch on > >>> unittest.cfg to make the test not run by default didn't work? > >> > >> I ran it on bare-metal and needed to spend some time to realize that it is > >> expected to fail on bare-metal “by design”. > > > > Ack. Maybe we should move tests like this into a *_virt_only.c > > counter-part? E.g., we could create a new, opt-in, file, > > vmx_tests_virt_only.c for this test. When similar scenarios arise in > > the future, this new precedent could be replicated, to make it obvious > > which tests are expected to fail on bare metal. > > Thanks for the willingness, but I don’t know whether any intrusive change is > needed at the moment. Even just getting the print-out to have something like > “(KVM-specific)” comment would be enough, assuming the test can easily be > disabled (as is the case with atomic_switch_overflow_msrs_test). "(as is the case with atomic_switch_overflow_msrs_test)" --> now I'm confused. Are you saying that the first test, atomic_switch_max_msrs_test() failed on bare metal? That test is expected to pass on bare metal. However, I did not test it on bare metal.
> On Sep 30, 2019, at 5:12 PM, Marc Orr <marcorr@google.com> wrote: > >>>>>> Thanks for caring, but it would be better to explicitly skip the test if it >>>>>> is not running on bare-metal. For instance, I missed this thread and needed >>>>>> to check why the test fails on bare-metal... >>>>>> >>>>>> Besides, it seems that v6 was used and not v7, so the error messages are >>>>>> strange: >>>>>> >>>>>> Test suite: atomic_switch_overflow_msrs_test >>>>>> FAIL: exit_reason, 18, is 2147483682. >>>>>> FAIL: exit_qual, 0, is 513. >>>>>> SUMMARY: 11 tests, 2 unexpected failures >>>>>> >>>>>> I also think that printing the exit-reason in hex format would be more >>>>>> readable. >>>>> >>>>> Exit reasons are enumerated in decimal rather than hex in the SDM >>>>> (volume 3, appendix C). >>>> >>>> I know, but when the failed VM entry indication is on, it is just a huge >>>> mess. Never mind, this is a minor issue. >>>> >>>>> To be clear, are you saying you "opted in" to the test on bare metal, >>>>> and got confused when it failed? Or, are you saying that our patch on >>>>> unittest.cfg to make the test not run by default didn't work? >>>> >>>> I ran it on bare-metal and needed to spend some time to realize that it is >>>> expected to fail on bare-metal “by design”. >>> >>> Ack. Maybe we should move tests like this into a *_virt_only.c >>> counter-part? E.g., we could create a new, opt-in, file, >>> vmx_tests_virt_only.c for this test. When similar scenarios arise in >>> the future, this new precedent could be replicated, to make it obvious >>> which tests are expected to fail on bare metal. >> >> Thanks for the willingness, but I don’t know whether any intrusive change is >> needed at the moment. Even just getting the print-out to have something like >> “(KVM-specific)” comment would be enough, assuming the test can easily be >> disabled (as is the case with atomic_switch_overflow_msrs_test). > > "(as is the case with atomic_switch_overflow_msrs_test)" --> now I'm > confused. Are you saying that the first test, > atomic_switch_max_msrs_test() failed on bare metal? That test is > expected to pass on bare metal. However, I did not test it on bare > metal. Sorry, my bad: atomic_switch_max_msrs_test() passes on bare-metal. atomic_switch_overflow_msrs_test() fails. Everything is as (I understand is) expected to be. I just copy-pasted the wrong test name in my last reply.
diff --git a/lib/alloc_page.c b/lib/alloc_page.c index 97d13395ff08..ed236389537e 100644 --- a/lib/alloc_page.c +++ b/lib/alloc_page.c @@ -53,6 +53,11 @@ void free_pages(void *mem, unsigned long size) spin_unlock(&lock); } +void free_pages_by_order(void *mem, unsigned long order) +{ + free_pages(mem, 1ul << (order + PAGE_SHIFT)); +} + void *alloc_page() { void *p; diff --git a/lib/alloc_page.h b/lib/alloc_page.h index 5cdfec57a0a8..739a91def979 100644 --- a/lib/alloc_page.h +++ b/lib/alloc_page.h @@ -14,5 +14,6 @@ void *alloc_page(void); void *alloc_pages(unsigned long order); void free_page(void *page); void free_pages(void *mem, unsigned long size); +void free_pages_by_order(void *mem, unsigned long order); #endif diff --git a/x86/unittests.cfg b/x86/unittests.cfg index 694ee3d42f3a..05122cf91ea1 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -227,7 +227,7 @@ extra_params = -cpu qemu64,+umip [vmx] file = vmx.flat -extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test" +extra_params = -cpu host,+vmx -append "-exit_monitor_from_l2_test -ept_access* -vmx_smp* -vmx_vmcs_shadow_test -atomic_switch_overflow_msrs_test" arch = x86_64 groups = vmx diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index f035f24a771a..6eb18d8fd3e4 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -8570,6 +8570,127 @@ static int invalid_msr_entry_failure(struct vmentry_failure *failure) return VMX_TEST_VMEXIT; } +/* + * The max number of MSRs in an atomic switch MSR list is: + * (111B + 1) * 512 = 4096 + * + * Each list entry consumes: + * 4-byte MSR index + 4 bytes reserved + 8-byte data = 16 bytes + * + * Allocate 128 kB to cover max_msr_list_size (i.e., 64 kB) and then some. + */ +static const u32 msr_list_page_order = 5; + +static void populate_msr_list(struct vmx_msr_entry *msr_list, + size_t byte_capacity, int count) +{ + int i; + + for (i = 0; i < count; i++) { + msr_list[i].index = MSR_IA32_TSC; + msr_list[i].reserved = 0; + msr_list[i].value = 0x1234567890abcdef; + } + + memset(msr_list + count, 0xff, + byte_capacity - count * sizeof(*msr_list)); +} + +static int max_msr_list_size(void) +{ + u32 vmx_misc = rdmsr(MSR_IA32_VMX_MISC); + u32 factor = ((vmx_misc & GENMASK(27, 25)) >> 25) + 1; + + return factor * 512; +} + +static void atomic_switch_msrs_test(int count) +{ + int max_allowed = max_msr_list_size(); + int byte_capacity = 1ul << (msr_list_page_order + PAGE_SHIFT); + /* KVM signals VM-Abort if an exit MSR list exceeds the max size. */ + int exit_count = MIN(count, max_allowed); + + /* + * Check for the IA32_TSC MSR, + * available with the "TSC flag" and used to populate the MSR lists. + */ + if (!(cpuid(1).d & (1 << 4))) { + report_skip(__func__); + return; + } + + /* Set L2 guest. */ + test_set_guest(v2_null_test_guest); + + /* Setup atomic MSR switch lists. */ + entry_msr_load = alloc_pages(msr_list_page_order); + exit_msr_load = alloc_pages(msr_list_page_order); + exit_msr_store = alloc_pages(msr_list_page_order); + + vmcs_write(ENTER_MSR_LD_ADDR, (u64)entry_msr_load); + vmcs_write(EXIT_MSR_LD_ADDR, (u64)exit_msr_load); + vmcs_write(EXIT_MSR_ST_ADDR, (u64)exit_msr_store); + + /* + * VM-Enter should succeed up to the max number of MSRs per list, and + * should not consume junk beyond the last entry. + */ + populate_msr_list(entry_msr_load, byte_capacity, count); + populate_msr_list(exit_msr_load, byte_capacity, exit_count); + populate_msr_list(exit_msr_store, byte_capacity, exit_count); + + vmcs_write(ENT_MSR_LD_CNT, count); + vmcs_write(EXI_MSR_LD_CNT, exit_count); + vmcs_write(EXI_MSR_ST_CNT, exit_count); + + if (count <= max_allowed) { + /* + * enter_guest() verifies that VM-enter succeeds. After the + * test completes, the test harness (see test_run() in vmx.c) + * verifies that the VM-enter completes by reaching the end of + * v2_null_test_guest(). + */ + enter_guest(); + } else { + u32 exit_reason; + u32 exit_reason_want; + u32 exit_qual; + + enter_guest_with_invalid_guest_state(); + + exit_reason = vmcs_read(EXI_REASON); + exit_reason_want = VMX_FAIL_MSR | VMX_ENTRY_FAILURE; + report("exit_reason %u, expected %u.", + exit_reason == exit_reason_want, exit_reason, + exit_reason_want); + + exit_qual = vmcs_read(EXI_QUALIFICATION); + report("exit_qual %u, expected %u.", + exit_qual == max_allowed + 1, exit_qual, + max_allowed + 1); + + /* Enter the guest (with valid counts) to set guest_finished. */ + vmcs_write(ENT_MSR_LD_CNT, 0); + vmcs_write(EXI_MSR_LD_CNT, 0); + vmcs_write(EXI_MSR_ST_CNT, 0); + enter_guest(); + } + + free_pages_by_order(entry_msr_load, msr_list_page_order); + free_pages_by_order(exit_msr_load, msr_list_page_order); + free_pages_by_order(exit_msr_store, msr_list_page_order); +} + +static void atomic_switch_max_msrs_test(void) +{ + atomic_switch_msrs_test(max_msr_list_size()); +} + +static void atomic_switch_overflow_msrs_test(void) +{ + atomic_switch_msrs_test(max_msr_list_size() + 1); +} #define TEST(name) { #name, .v2 = name } @@ -8660,5 +8781,8 @@ struct vmx_test vmx_tests[] = { TEST(ept_access_test_paddr_read_execute_ad_enabled), TEST(ept_access_test_paddr_not_present_page_fault), TEST(ept_access_test_force_2m_page), + /* Atomic MSR switch tests. */ + TEST(atomic_switch_max_msrs_test), + TEST(atomic_switch_overflow_msrs_test), { NULL, NULL, NULL, NULL, NULL, {0} }, };