Message ID | 20240307194255.1367442-1-dmatlack@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: selftests: Create memslot 0 at GPA 0x100000000 on x86_64 | expand |
On Thu, Mar 07, 2024, David Matlack wrote: > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > KVM's private memslot for the APIC-access page. This is going to cause other problems, e.g. from max_guest_memory_test.c /* * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back * the guest's code, stack, and page tables. Because selftests creates * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot * just below the 4gb boundary. This test could create memory at * 1gb-3gb,but it's simpler to skip straight to 4gb. */ const uint64_t start_gpa = SZ_4G; Trying to move away from starting at '0' is going to be problematic/annoying, e.g. using low memory allows tests to safely assume 4GiB+ is always available. And I'd prefer not to make the infrastucture all twisty and weird for all tests just because memstress tests want to play with huge amounts of memory. Any chance we can solve this by using huge pages in the guest, and adjusting the gorilla math in vm_nr_pages_required() accordingly? There's really no reason to use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive of real world workloads (at least, I hope real world workloads are using 2MiB or 1GiB pages in this case).
On 2024-03-07 02:37 PM, Sean Christopherson wrote: > On Thu, Mar 07, 2024, David Matlack wrote: > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > KVM's private memslot for the APIC-access page. > > This is going to cause other problems, e.g. from max_guest_memory_test.c > > /* > * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back > * the guest's code, stack, and page tables. Because selftests creates > * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot > * just below the 4gb boundary. This test could create memory at > * 1gb-3gb,but it's simpler to skip straight to 4gb. > */ > const uint64_t start_gpa = SZ_4G; > > Trying to move away from starting at '0' is going to be problematic/annoying, > e.g. using low memory allows tests to safely assume 4GiB+ is always available. > And I'd prefer not to make the infrastucture all twisty and weird for all tests > just because memstress tests want to play with huge amounts of memory. > > Any chance we can solve this by using huge pages in the guest, and adjusting the > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > of real world workloads (at least, I hope real world workloads are using 2MiB or > 1GiB pages in this case). There are real world workloads that use TiB of RAM with 4KiB mappings (looking at you SAP HANA). What about giving tests an explicit "start" GPA they can use? That would fix max_guest_memory_test and avoid tests making assumptions about 4GiB being a magically safe address to use. e.g. Something like this on top: diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 194963e05341..584ac6fea65c 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -101,6 +101,11 @@ struct kvm_vm { unsigned int page_shift; unsigned int pa_bits; unsigned int va_bits; + /* + * Tests are able to use the guest physical address space from + * [available_base_gpa, max_gfn << page_shift) for their own purposes. + */ + vm_paddr_t available_base_gpa; uint64_t max_gfn; struct list_head vcpus; struct userspace_mem_regions regions; diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c8d7e66d308d..e74d9efa82c2 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -17,6 +17,7 @@ #include <sys/stat.h> #include <unistd.h> #include <linux/kernel.h> +#include <linux/sizes.h> #define KVM_UTIL_MIN_PFN 2 @@ -414,6 +415,7 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus, uint64_t nr_pages = vm_nr_pages_required(shape.mode, nr_runnable_vcpus, nr_extra_pages); struct userspace_mem_region *slot0; + vm_paddr_t ucall_mmio_gpa; struct kvm_vm *vm; int i; @@ -436,7 +438,15 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus, * MMIO region would prevent silently clobbering the MMIO region. */ slot0 = memslot2region(vm, 0); - ucall_init(vm, slot0->region.guest_phys_addr + slot0->region.memory_size); + ucall_mmio_gpa = slot0->region.guest_phys_addr + slot0->region.memory_size; + ucall_init(vm, ucall_mmio_gpa); + + /* + * 1GiB is somewhat arbitrary, but is chosen to be large enough to meet + * most tests' alignment requirements/expectations. + */ + vm->available_base_gpa = + SZ_1G * DIV_ROUND_UP(ucall_mmio_gpa + vm->page_size, SZ_1G); kvm_arch_vm_post_create(vm); diff --git a/tools/testing/selftests/kvm/max_guest_memory_test.c b/tools/testing/selftests/kvm/max_guest_memory_test.c index 6628dc4dda89..f5d77d2a903d 100644 --- a/tools/testing/selftests/kvm/max_guest_memory_test.c +++ b/tools/testing/selftests/kvm/max_guest_memory_test.c @@ -156,18 +156,10 @@ static void calc_default_nr_vcpus(void) int main(int argc, char *argv[]) { - /* - * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back - * the guest's code, stack, and page tables. Because selftests creates - * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot - * just below the 4gb boundary. This test could create memory at - * 1gb-3gb,but it's simpler to skip straight to 4gb. - */ - const uint64_t start_gpa = SZ_4G; const int first_slot = 1; struct timespec time_start, time_run1, time_reset, time_run2; - uint64_t max_gpa, gpa, slot_size, max_mem, i; + uint64_t start_gpa, max_gpa, gpa, slot_size, max_mem, i; int max_slots, slot, opt, fd; bool hugepages = false; struct kvm_vcpu **vcpus; @@ -229,6 +221,7 @@ int main(int argc, char *argv[]) for (i = 0; i < slot_size; i += vm->page_size) ((uint8_t *)mem)[i] = 0xaa; + start_gpa = vm->available_base_gpa; gpa = 0; for (slot = first_slot; slot < max_slots; slot++) { gpa = start_gpa + ((slot - first_slot) * slot_size);
On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@google.com> wrote: > > On 2024-03-07 02:37 PM, Sean Christopherson wrote: > > On Thu, Mar 07, 2024, David Matlack wrote: > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > > KVM's private memslot for the APIC-access page. > > > > This is going to cause other problems, e.g. from max_guest_memory_test.c > > > > /* > > * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back > > * the guest's code, stack, and page tables. Because selftests creates > > * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot > > * just below the 4gb boundary. This test could create memory at > > * 1gb-3gb,but it's simpler to skip straight to 4gb. > > */ > > const uint64_t start_gpa = SZ_4G; > > > > Trying to move away from starting at '0' is going to be problematic/annoying, > > e.g. using low memory allows tests to safely assume 4GiB+ is always available. > > And I'd prefer not to make the infrastucture all twisty and weird for all tests > > just because memstress tests want to play with huge amounts of memory. > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the > > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > > of real world workloads (at least, I hope real world workloads are using 2MiB or > > 1GiB pages in this case). > > There are real world workloads that use TiB of RAM with 4KiB mappings > (looking at you SAP HANA). > > What about giving tests an explicit "start" GPA they can use? That would > fix max_guest_memory_test and avoid tests making assumptions about 4GiB > being a magically safe address to use. > > e.g. Something like this on top: Gah, I missed nx_huge_page_test.c, which needs similar changes to max_memory_test.c. Also if you prefer the "start" address be a compile time constant we can pick an arbitrary number above 4GiB and use that (e.g. start=32GiB would be more than enough for a 12TiB guest).
On Thu, Mar 07, 2024, David Matlack wrote: > On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@google.com> wrote: > > > > On 2024-03-07 02:37 PM, Sean Christopherson wrote: > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > > > KVM's private memslot for the APIC-access page. > > > > > > This is going to cause other problems, e.g. from max_guest_memory_test.c > > > > > > /* > > > * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back > > > * the guest's code, stack, and page tables. Because selftests creates > > > * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot > > > * just below the 4gb boundary. This test could create memory at > > > * 1gb-3gb,but it's simpler to skip straight to 4gb. > > > */ > > > const uint64_t start_gpa = SZ_4G; > > > > > > Trying to move away from starting at '0' is going to be problematic/annoying, > > > e.g. using low memory allows tests to safely assume 4GiB+ is always available. > > > And I'd prefer not to make the infrastucture all twisty and weird for all tests > > > just because memstress tests want to play with huge amounts of memory. > > > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the > > > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > > > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > > > of real world workloads (at least, I hope real world workloads are using 2MiB or > > > 1GiB pages in this case). > > > > There are real world workloads that use TiB of RAM with 4KiB mappings > > (looking at you SAP HANA). > > > > What about giving tests an explicit "start" GPA they can use? That would > > fix max_guest_memory_test and avoid tests making assumptions about 4GiB > > being a magically safe address to use. > > > > e.g. Something like this on top: > > Gah, I missed nx_huge_page_test.c, which needs similar changes to > max_memory_test.c. > > Also if you prefer the "start" address be a compile time constant we > can pick an arbitrary number above 4GiB and use that (e.g. start=32GiB > would be more than enough for a 12TiB guest). Aha! Idea. I think we can clean up multiple warts at once. The underlying problem is the memory that is allocated for guest page tables. The core code allocation is constant for any given test, and a complete non-issue unless someone writes a *really* crazy test. And while core data (stack) allocations scale with the number of vCPUs, they are (a) articifically bounded by the maximum number of vCPUs and (b) relatively small allocations (a few pages per vCPU). And for page table allocations, we already have this absurd magic value: #define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000 which I'm guessing exists to avoid clobbering code+stack allocations, but that's an irrelevant tangent. The other asset we now have is vm->memslots[NR_MEM_REGIONS], and more specifically allocations for guest page tables are done via vm->memslots[MEM_REGION_PT]. So, rather than more hardcoded addresses and/or a knob to control _all_ code allocations, I think we should provide knob to say that MEM_REGION_PT should go to memory above 4GiB. And to make memslot handling maintainable in the long term: 1. Add a knob to place MEM_REGION_PT at 4GiB (and as of this initial patch, conditionally in their own memslot). 2. Use the PT_AT_4GIB (not the real name) knob for the various memstress tests that need it. 3. Formalize memslots 0..2 (CODE, DATA, and PT) as being owned by the library, with memslots 3..MAX available for test usage. 4. Modify tests that assume memslots 1..MAX are available, i.e. force them to start at MEM_REGION_TEST_DATA. 5. Use separate memslots for CODE, DATA, and PT by default. This will allow for more precise sizing of the CODE and DATA slots. 6. Shrink the number of pages for CODE to a more reasonable number. Currently vm_nr_pages_required() reserves 512 pages / 2MiB for per-VM assets, which at a glance seems ridiculously excessive. 7. Use the PT_AT_4GIB knob in s390's CMMA test? I suspect it does memslot shenanigans purely so that a low gfn (4096 in the test) is guaranteed to be available. For #4, I think it's a fairly easy change. E.g. set_memory_region_test.c, just do s/MEM_REGION_SLOT/MEM_REGION_TEST_DATA. And for max_guest_memory_test.c: @@ -157,14 +154,14 @@ static void calc_default_nr_vcpus(void) int main(int argc, char *argv[]) { /* - * Skip the first 4gb and slot0. slot0 maps <1gb and is used to back - * the guest's code, stack, and page tables. Because selftests creates + * Skip the first 4gb, which are reserved by the core library for the + * guest's code, stack, and page tables. Because selftests creates * an IRQCHIP, a.k.a. a local APIC, KVM creates an internal memslot * just below the 4gb boundary. This test could create memory at * 1gb-3gb,but it's simpler to skip straight to 4gb. */ const uint64_t start_gpa = SZ_4G; - const int first_slot = 1; + const int first_slot = MEM_REGION_TEST_DATA; struct timespec time_start, time_run1, time_reset, time_run2; uint64_t max_gpa, gpa, slot_size, max_mem, i;
On Wed, Mar 13, 2024 at 7:31 AM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 07, 2024, David Matlack wrote: > > On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@google.com> wrote: > > > > > > On 2024-03-07 02:37 PM, Sean Christopherson wrote: > > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > > > > KVM's private memslot for the APIC-access page. > > > > > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the > > > > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > > > > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > > > > of real world workloads (at least, I hope real world workloads are using 2MiB or > > > > 1GiB pages in this case). > > > > > > There are real world workloads that use TiB of RAM with 4KiB mappings > > > (looking at you SAP HANA). > > > > > > What about giving tests an explicit "start" GPA they can use? That would > > > fix max_guest_memory_test and avoid tests making assumptions about 4GiB > > > being a magically safe address to use. > > So, rather than more hardcoded addresses and/or a knob to control _all_ code > allocations, I think we should provide knob to say that MEM_REGION_PT should go > to memory above 4GiB. And to make memslot handling maintainable in the long term: > > 1. Add a knob to place MEM_REGION_PT at 4GiB (and as of this initial patch, > conditionally in their own memslot). > > 2. Use the PT_AT_4GIB (not the real name) knob for the various memstress tests > that need it. Making tests pick when to place page tables at 4GiB seems unnecessary. Tests that don't otherwise need a specific physical memory layout should be able to create a VM with any amount of memory and have it just work. It's also not impossible that a test has 4GiB+ .bss because the guest needs a big array for something. In that case we'd need a knob to move MEM_REGION_CODE above 4GiB on x86_64 as well. For x86_64 (which is the only architecture AFAIK that has a private memslot in KVM the framework can overlap with), what's the downside of always putting all memslots above 4GiB? > > 3. Formalize memslots 0..2 (CODE, DATA, and PT) as being owned by the library, > with memslots 3..MAX available for test usage. > > 4. Modify tests that assume memslots 1..MAX are available, i.e. force them to > start at MEM_REGION_TEST_DATA. I think MEM_REGION_TEST_DATA is just where the framework will satisfy test-initiated dynamic memory allocations. That's different from which slots are free for the test to use. But assuming I understand your intention, I agree in spirit... Tests should be allowed to use slots TEST_SLOT..MAX and physical addresses TEST_GPA..MAX. The framework should provide both TEST_SLOT and TEST_GPA (names pending), and existing tests should use those instead of random hard-coded values. > > 5. Use separate memslots for CODE, DATA, and PT by default. This will allow > for more precise sizing of the CODE and DATA slots. What do you mean by "[separate memslots] will allow for more precise sizing"? > > 6. Shrink the number of pages for CODE to a more reasonable number. Currently > vm_nr_pages_required() reserves 512 pages / 2MiB for per-VM assets, which > at a glance seems ridiculously excessive. > > 7. Use the PT_AT_4GIB knob in s390's CMMA test? I suspect it does memslot > shenanigans purely so that a low gfn (4096 in the test) is guaranteed to > be available. +Nico Hm, if this test _needs_ to use GFN 4096, then maybe the framework can give tests two regions 0..KVM_FRAMEWORK_GPA and TEST_GPA..MAX. If the test just needs any GFN then it can use TEST_GPA instead of 4096 << page_shift.
On Thu, Mar 14, 2024, David Matlack wrote: > On Wed, Mar 13, 2024 at 7:31 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@google.com> wrote: > > > > > > > > On 2024-03-07 02:37 PM, Sean Christopherson wrote: > > > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > > > > > KVM's private memslot for the APIC-access page. > > > > > > > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the > > > > > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > > > > > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > > > > > of real world workloads (at least, I hope real world workloads are using 2MiB or > > > > > 1GiB pages in this case). > > > > > > > > There are real world workloads that use TiB of RAM with 4KiB mappings > > > > (looking at you SAP HANA). > > > > > > > > What about giving tests an explicit "start" GPA they can use? That would > > > > fix max_guest_memory_test and avoid tests making assumptions about 4GiB > > > > being a magically safe address to use. > > > > So, rather than more hardcoded addresses and/or a knob to control _all_ code > > allocations, I think we should provide knob to say that MEM_REGION_PT should go > > to memory above 4GiB. And to make memslot handling maintainable in the long term: > > > > 1. Add a knob to place MEM_REGION_PT at 4GiB (and as of this initial patch, > > conditionally in their own memslot). > > > > 2. Use the PT_AT_4GIB (not the real name) knob for the various memstress tests > > that need it. > > Making tests pick when to place page tables at 4GiB seems unnecessary. > Tests that don't otherwise need a specific physical memory layout > should be able to create a VM with any amount of memory and have it > just work. > > It's also not impossible that a test has 4GiB+ .bss because the guest > needs a big array for something. In that case we'd need a knob to move > MEM_REGION_CODE above 4GiB on x86_64 as well. LOL, at that point, the test can darn well dynamically allocate its memory. Though I'd love to see a test that needs a 3GiB array :-) > For x86_64 (which is the only architecture AFAIK that has a private > memslot in KVM the framework can overlap with), what's the downside of > always putting all memslots above 4GiB? Divergence from other architectures, divergence from "real" VM configurations, and a more compliciated programming environment for the vast majority of tests. E.g. a test that uses more than ~3GiB of memory would need to dynamically place its test specific memslots, whereas if the core library keeps everything under 4GiB by default, then on x86 every test knows it has 4GiB+ to play with. One could argue that dynamically placing test specific would be more elegant, but I'd prefer to avoid that because I can't think of any value it would add from a test coverage perspective, and static addresses are much easier when it comes to debug. Having tests use e.g. 2GiB-3GiB or 1GiB-3GiB, would kinda sorta work, but that 2GiB limit isn't a trivial, e.g. max_guest_memory_test creates TiBs of memslots. IMO, memstress is the odd one out, it should be the one that needs to do special things. > > 3. Formalize memslots 0..2 (CODE, DATA, and PT) as being owned by the library, > > with memslots 3..MAX available for test usage. > > > > 4. Modify tests that assume memslots 1..MAX are available, i.e. force them to > > start at MEM_REGION_TEST_DATA. > > I think MEM_REGION_TEST_DATA is just where the framework will satisfy > test-initiated dynamic memory allocations. That's different from which > slots are free for the test to use. > > But assuming I understand your intention, I agree in spirit... Tests > should be allowed to use slots TEST_SLOT..MAX and physical addresses > TEST_GPA..MAX. The framework should provide both TEST_SLOT and > TEST_GPA (names pending), and existing tests should use those instead > of random hard-coded values. > > > > > 5. Use separate memslots for CODE, DATA, and PT by default. This will allow > > for more precise sizing of the CODE and DATA slots. > > What do you mean by "[separate memslots] will allow for more precise sizing"? I suspect there is a _lot_ of slop in the arbitrary 512 pages that are tacked on by vm_nr_pages_required(). Those 2MiBs probably don't matter, I just don't like completely magical numbers. > > 6. Shrink the number of pages for CODE to a more reasonable number. Currently > > vm_nr_pages_required() reserves 512 pages / 2MiB for per-VM assets, which > > at a glance seems ridiculously excessive. > > > > 7. Use the PT_AT_4GIB knob in s390's CMMA test? I suspect it does memslot > > shenanigans purely so that a low gfn (4096 in the test) is guaranteed to > > be available. > > +Nico > > Hm, if this test _needs_ to use GFN 4096, then maybe the framework can > give tests two regions 0..KVM_FRAMEWORK_GPA and TEST_GPA..MAX. > > If the test just needs any GFN then it can use TEST_GPA instead of > 4096 << page_shift.
On Thu, Mar 14, 2024 at 4:14 PM Sean Christopherson <seanjc@google.com> wrote: > > On Thu, Mar 14, 2024, David Matlack wrote: > > On Wed, Mar 13, 2024 at 7:31 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > > On Thu, Mar 7, 2024 at 3:27 PM David Matlack <dmatlack@google.com> wrote: > > > > > > > > > > On 2024-03-07 02:37 PM, Sean Christopherson wrote: > > > > > > On Thu, Mar 07, 2024, David Matlack wrote: > > > > > > > Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with > > > > > > > KVM's private memslot for the APIC-access page. > > > > > > > > > > > > Any chance we can solve this by using huge pages in the guest, and adjusting the > > > > > > gorilla math in vm_nr_pages_required() accordingly? There's really no reason to > > > > > > use 4KiB pages for a VM with 256GiB of memory. That'd also be more represantitive > > > > > > of real world workloads (at least, I hope real world workloads are using 2MiB or > > > > > > 1GiB pages in this case). > > > > > > > > > > There are real world workloads that use TiB of RAM with 4KiB mappings > > > > > (looking at you SAP HANA). > > > > > > > > > > What about giving tests an explicit "start" GPA they can use? That would > > > > > fix max_guest_memory_test and avoid tests making assumptions about 4GiB > > > > > being a magically safe address to use. > > > > > > So, rather than more hardcoded addresses and/or a knob to control _all_ code > > > allocations, I think we should provide knob to say that MEM_REGION_PT should go > > > to memory above 4GiB. And to make memslot handling maintainable in the long term: > > > > > > 1. Add a knob to place MEM_REGION_PT at 4GiB (and as of this initial patch, > > > conditionally in their own memslot). > > > > > > 2. Use the PT_AT_4GIB (not the real name) knob for the various memstress tests > > > that need it. > > > > Making tests pick when to place page tables at 4GiB seems unnecessary. > > Tests that don't otherwise need a specific physical memory layout > > should be able to create a VM with any amount of memory and have it > > just work. > > > > It's also not impossible that a test has 4GiB+ .bss because the guest > > needs a big array for something. In that case we'd need a knob to move > > MEM_REGION_CODE above 4GiB on x86_64 as well. > > LOL, at that point, the test can darn well dynamically allocate its memory. > Though I'd love to see a test that needs a 3GiB array :-) > > > For x86_64 (which is the only architecture AFAIK that has a private > > memslot in KVM the framework can overlap with), what's the downside of > > always putting all memslots above 4GiB? > > Divergence from other architectures, divergence from "real" VM configurations, > and a more compliciated programming environment for the vast majority of tests. > E.g. a test that uses more than ~3GiB of memory would need to dynamically place > its test specific memslots, whereas if the core library keeps everything under > 4GiB by default, then on x86 every test knows it has 4GiB+ to play with. Divergence from real VM configurations is a really good point. I was thinking we _could_ make all architectures start at 4GiB and make 32GiB the new static address available for tests to solve the architecture divergence and complexity problems. But that would mean all KVM selftests don't use GPAs 0-4GiB, and that seems like a terrible idea now that you point it out. Thanks for the thoughtful feedback. I'll give your suggestions a try in v2. > > One could argue that dynamically placing test specific would be more elegant, > but I'd prefer to avoid that because I can't think of any value it would add > from a test coverage perspective, and static addresses are much easier when it > comes to debug. > > Having tests use e.g. 2GiB-3GiB or 1GiB-3GiB, would kinda sorta work, but that > 2GiB limit isn't a trivial, e.g. max_guest_memory_test creates TiBs of memslots. > > IMO, memstress is the odd one out, it should be the one that needs to do special > things. Fair point. > > > > 3. Formalize memslots 0..2 (CODE, DATA, and PT) as being owned by the library, > > > with memslots 3..MAX available for test usage. > > > > > > 4. Modify tests that assume memslots 1..MAX are available, i.e. force them to > > > start at MEM_REGION_TEST_DATA. > > > > I think MEM_REGION_TEST_DATA is just where the framework will satisfy > > test-initiated dynamic memory allocations. That's different from which > > slots are free for the test to use. > > > > But assuming I understand your intention, I agree in spirit... Tests > > should be allowed to use slots TEST_SLOT..MAX and physical addresses > > TEST_GPA..MAX. The framework should provide both TEST_SLOT and > > TEST_GPA (names pending), and existing tests should use those instead > > of random hard-coded values. > > > > > > > > 5. Use separate memslots for CODE, DATA, and PT by default. This will allow > > > for more precise sizing of the CODE and DATA slots. > > > > What do you mean by "[separate memslots] will allow for more precise sizing"? > > I suspect there is a _lot_ of slop in the arbitrary 512 pages that are tacked on > by vm_nr_pages_required(). Those 2MiBs probably don't matter, I just don't like > completely magical numbers. That makes sense, we can probably tighten up those heuristics and maybe even get rid of the magic numbers. But I wasn't following what _separate memslots_ has to do with it?
On Fri, Mar 15, 2024, David Matlack wrote: > On Thu, Mar 14, 2024 at 4:14 PM Sean Christopherson <seanjc@google.com> wrote: > > > > 5. Use separate memslots for CODE, DATA, and PT by default. This will allow > > > > for more precise sizing of the CODE and DATA slots. > > > > > > What do you mean by "[separate memslots] will allow for more precise sizing"? > > > > I suspect there is a _lot_ of slop in the arbitrary 512 pages that are tacked on > > by vm_nr_pages_required(). Those 2MiBs probably don't matter, I just don't like > > completely magical numbers. > > That makes sense, we can probably tighten up those heuristics and > maybe even get rid of the magic numbers. But I wasn't following what > _separate memslots_ has to do with it? Ah, don't dwell on that too much, it was somewhat of a passing thought. My thinking was that the gorilla math for the page tables makes it hard to determine how much of those 512 pages is actually for DATA, versus how much is actually a weird recursive calculation for the page tables themselves. I.e. it's hard to trim down @nr_pages, because it might no be easy to tell if DATA shrank too much, or if it actually caused to PT to shrink too much, and so no one touches that mess.
Quoting David Matlack (2024-03-14 22:11:57) [...] > > 7. Use the PT_AT_4GIB knob in s390's CMMA test? I suspect it does memslot > > shenanigans purely so that a low gfn (4096 in the test) is guaranteed to > > be available. > > +Nico > > Hm, if this test _needs_ to use GFN 4096, then maybe the framework can > give tests two regions 0..KVM_FRAMEWORK_GPA and TEST_GPA..MAX. > > If the test just needs any GFN then it can use TEST_GPA instead of > 4096 << page_shift. Sorry for the late reply, this got burried at the bottom of my inbox :( The test creates two memslots with a gap in between to be able to test that gaps in memslots are correctly skipped by the CMMA-related ioctls. The test doesn't need GFN 4096. It should not matter where TEST_GPA is, as long as there's a gap after the main memslot. Feel free to include me on a follow-up, then I can test and look at this.
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h index 3bd03b088dda..ee33528007ee 100644 --- a/tools/testing/selftests/kvm/include/x86_64/processor.h +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h @@ -20,6 +20,12 @@ #include "../kvm_util.h" +/* + * Create memslot 0 at 4GiB to avoid overlapping with KVM's private memslot for + * the APIC-access page at 0xfee00000. + */ +#define KVM_UTIL_MEMSLOT0_GPA 0x100000000ULL + extern bool host_cpu_is_intel; extern bool host_cpu_is_amd; diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index b2262b5fad9e..c8d7e66d308d 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -20,6 +20,10 @@ #define KVM_UTIL_MIN_PFN 2 +#ifndef KVM_UTIL_MEMSLOT0_GPA +#define KVM_UTIL_MEMSLOT0_GPA 0ULL +#endif + static int vcpu_mmap_sz(void); int open_path_or_exit(const char *path, int flags) @@ -418,7 +422,8 @@ struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus, vm = ____vm_create(shape); - vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + KVM_UTIL_MEMSLOT0_GPA, 0, nr_pages, 0); for (i = 0; i < NR_MEM_REGIONS; i++) vm->memslots[i] = 0;
Create memslot 0 at 0x100000000 (4GiB) to avoid it overlapping with KVM's private memslot for the APIC-access page. Without this change running large-enough selftests (high number of vCPUs and/or guest memory) can result in KVM_CREATE_VCPU failing because the APIC-access page memslot overlaps memslot 0 created by the selftest: $ ./dirty_log_perf_test -v 256 -s anonymous_hugetlb_1gb -b 4G Test iterations: 2 Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages ==== Test Assertion Failure ==== lib/kvm_util.c:1341: vcpu->fd >= 0 pid=53654 tid=53654 errno=17 - File exists (stack trace empty) KVM_CREATE_VCPU failed, rc: -1 errno: 17 (File exists) Signed-off-by: David Matlack <dmatlack@google.com> --- tools/testing/selftests/kvm/include/x86_64/processor.h | 6 ++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 7 ++++++- 2 files changed, 12 insertions(+), 1 deletion(-) base-commit: 0c64952fec3ea01cb5b09f00134200f3e7ab40d5