Message ID | 1305061513-28360-1-git-send-email-levinsasha928@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, May 11, 2011 at 12:05 AM, Sasha Levin <levinsasha928@gmail.com> wrote: > + if (kvm->ram_size < 0xe0000000) { Please use the ULL postfix for constants to ensure the types are sane. Also, please come up with a sane name for these. > @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset) > > static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset) > { > - return self->ram_start + offset; > + /* > + * We have a gap between 0xe0000000 and 0x100000000. > + * Consider it when translating an address above 0x100000000. > + */ > + if (offset < 0xe0000000) > + return self->ram_start + offset; > + else > + return self->ram_start + 0xe0000000 + (offset - 0x100000000); > } Would it not be simpler to mmap() a "ram_size + gap_size" contiguous region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820 maps about two separate regions but guest_flat_to_host() would work-as-is. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-05-11 at 08:37 +0300, Pekka Enberg wrote: > On Wed, May 11, 2011 at 12:05 AM, Sasha Levin <levinsasha928@gmail.com> wrote: > > + if (kvm->ram_size < 0xe0000000) { > > Please use the ULL postfix for constants to ensure the types are sane. > Also, please come up with a sane name for these. > > > @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset) > > > > static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset) > > { > > - return self->ram_start + offset; > > + /* > > + * We have a gap between 0xe0000000 and 0x100000000. > > + * Consider it when translating an address above 0x100000000. > > + */ > > + if (offset < 0xe0000000) > > + return self->ram_start + offset; > > + else > > + return self->ram_start + 0xe0000000 + (offset - 0x100000000); > > } > > Would it not be simpler to mmap() a "ram_size + gap_size" contiguous > region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820 > maps about two separate regions but guest_flat_to_host() would > work-as-is. I've wanted to avoid actually allocating that gap (which is currently 512MB) and instead take the hit in guest_flat_to_host(). If you feel the 512MB vs guest_flat_to_host() trade-off is worth it, I'll change it to work that way.
On 5/11/11 9:21 AM, Sasha Levin wrote: > If you feel the 512MB vs guest_flat_to_host() trade-off is worth it, > I'll change it to work that way. Why would it not be? This is 64-bit only, right? There's plenty of virtual address space and mprotect() should make sure we never allocate physical pages for it. Sure, there's some in-kernel overhead involved as well, but that's extremely small. I'm not worried about performance in guest_flat_to_host() but I think the current implementation is not very clean. If you want to mmap() two separate regions, we should have our own internal "memory map" that's used for this (and for populating KVM end E820 maps). So I think mmap'ing the gap is the cleanest solution for now. We can revisit the decision if we need even more regions in the future. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
* Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On 5/11/11 9:21 AM, Sasha Levin wrote: > >If you feel the 512MB vs guest_flat_to_host() trade-off is worth it, > >I'll change it to work that way. > > Why would it not be? This is 64-bit only, right? There's plenty of virtual > address space and mprotect() should make sure we never allocate physical > pages for it. Sure, there's some in-kernel overhead involved as well, but > that's extremely small. > > I'm not worried about performance in guest_flat_to_host() but I think the > current implementation is not very clean. If you want to mmap() two separate > regions, we should have our own internal "memory map" that's used for this > (and for populating KVM end E820 maps). > > So I think mmap'ing the gap is the cleanest solution for now. We can revisit > the decision if we need even more regions in the future. Agreed. There's also admittedly somewhat of a conceptual beauty in having a linearly addressable chunk of *all* guest physical RAM on the hypervisor side. Virtualization involves so many indirections to begin with that keeping the mental picture simpler is helpful IMHO ... Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/11/2011 08:37 AM, Pekka Enberg wrote: > On Wed, May 11, 2011 at 12:05 AM, Sasha Levin<levinsasha928@gmail.com> wrote: > > + if (kvm->ram_size< 0xe0000000) { > > Please use the ULL postfix for constants to ensure the types are sane. > Also, please come up with a sane name for these. > > > @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset) > > > > static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset) > > { > > - return self->ram_start + offset; > > + /* > > + * We have a gap between 0xe0000000 and 0x100000000. > > + * Consider it when translating an address above 0x100000000. > > + */ > > + if (offset< 0xe0000000) > > + return self->ram_start + offset; > > + else > > + return self->ram_start + 0xe0000000 + (offset - 0x100000000); > > } > > Would it not be simpler to mmap() a "ram_size + gap_size" contiguous > region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820 > maps about two separate regions but guest_flat_to_host() would > work-as-is. It doesn't work in general - if you have a PCI device with a BAR (like a video card framebuffer), then you need allocations for main memory (0+) and pci (0xe000000+). You can't have a contiguous mapping on i386 containing both.
* Avi Kivity <avi@redhat.com> wrote: > On 05/11/2011 08:37 AM, Pekka Enberg wrote: > >On Wed, May 11, 2011 at 12:05 AM, Sasha Levin<levinsasha928@gmail.com> wrote: > >> + if (kvm->ram_size< 0xe0000000) { > > > >Please use the ULL postfix for constants to ensure the types are sane. > >Also, please come up with a sane name for these. > > > >> @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset) > >> > >> static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset) > >> { > >> - return self->ram_start + offset; > >> + /* > >> + * We have a gap between 0xe0000000 and 0x100000000. > >> + * Consider it when translating an address above 0x100000000. > >> + */ > >> + if (offset< 0xe0000000) > >> + return self->ram_start + offset; > >> + else > >> + return self->ram_start + 0xe0000000 + (offset - 0x100000000); > >> } > > > >Would it not be simpler to mmap() a "ram_size + gap_size" contiguous > >region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820 > >maps about two separate regions but guest_flat_to_host() would > >work-as-is. > > It doesn't work in general - if you have a PCI device with a BAR > (like a video card framebuffer), then you need allocations for main > memory (0+) and pci (0xe000000+). You can't have a contiguous > mapping on i386 containing both. I think in tools/kvm/ we can ignore i386 hosts that would like to map more RAM than they have virtual address space for ... Guests up to 1-2 gigs of RAM will still work fine. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/11/2011 11:44 AM, Ingo Molnar wrote: > > > > It doesn't work in general - if you have a PCI device with a BAR > > (like a video card framebuffer), then you need allocations for main > > memory (0+) and pci (0xe000000+). You can't have a contiguous > > mapping on i386 containing both. > > I think in tools/kvm/ we can ignore i386 hosts that would like to map more RAM > than they have virtual address space for ... > > Guests up to 1-2 gigs of RAM will still work fine. > It doesn't work. Consider a guest with 128MB of RAM mapped at 0-128MB and an 8MB framebuffer mapped at 0xe0000000. There's no way to present this in a contiguous space. Well, I guess you can map the framebuffer lower, but that means giving up memory hotplug if you ever wish to implement it (and doesn't allow the guest to remap the framebuffer if it wishes to).
On 5/11/11 12:06 PM, Avi Kivity wrote: > Well, I guess you can map the framebuffer lower, but that means giving > up memory hotplug if you ever wish to implement it (and doesn't allow > the guest to remap the framebuffer if it wishes to). True. As I said, we need to do it properly at some point. But as long as it's only about the PCI hole for > 4GB guests, lets mmap() the whole range instead. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tools/kvm/bios.c b/tools/kvm/bios.c index 2199c0c..cd417fa 100644 --- a/tools/kvm/bios.c +++ b/tools/kvm/bios.c @@ -61,7 +61,7 @@ static void e820_setup(struct kvm *kvm) size = guest_flat_to_host(kvm, E820_MAP_SIZE); mem_map = guest_flat_to_host(kvm, E820_MAP_START); - *size = E820_MEM_AREAS; + mem_map[i++] = (struct e820_entry) { .addr = REAL_MODE_IVT_BEGIN, @@ -78,13 +78,28 @@ static void e820_setup(struct kvm *kvm) .size = MB_BIOS_END - MB_BIOS_BEGIN, .type = E820_MEM_RESERVED, }; - mem_map[i++] = (struct e820_entry) { - .addr = BZ_KERNEL_START, - .size = kvm->ram_size - BZ_KERNEL_START, - .type = E820_MEM_USABLE, - }; + if (kvm->ram_size < 0xe0000000) { + mem_map[i++] = (struct e820_entry) { + .addr = BZ_KERNEL_START, + .size = kvm->ram_size - BZ_KERNEL_START, + .type = E820_MEM_USABLE, + }; + } else { + mem_map[i++] = (struct e820_entry) { + .addr = BZ_KERNEL_START, + .size = 0xe0000000 - BZ_KERNEL_START, + .type = E820_MEM_USABLE, + }; + mem_map[i++] = (struct e820_entry) { + .addr = 0x100000000ULL, + .size = kvm->ram_size - 0xe0000000 - BZ_KERNEL_START, + .type = E820_MEM_USABLE, + }; + } BUILD_BUG_ON(i > E820_MEM_AREAS); + + *size = i; } /** diff --git a/tools/kvm/include/kvm/e820.h b/tools/kvm/include/kvm/e820.h index 252ae1f..e0f5f2a 100644 --- a/tools/kvm/include/kvm/e820.h +++ b/tools/kvm/include/kvm/e820.h @@ -8,7 +8,7 @@ #define E820_MEM_USABLE 1 #define E820_MEM_RESERVED 2 -#define E820_MEM_AREAS 4 +#define E820_MEM_AREAS 5 struct e820_entry { u64 addr; /* start of memory segment */ diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h index 3dab78d..e9c16ea 100644 --- a/tools/kvm/include/kvm/kvm.h +++ b/tools/kvm/include/kvm/kvm.h @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset) static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset) { - return self->ram_start + offset; + /* + * We have a gap between 0xe0000000 and 0x100000000. + * Consider it when translating an address above 0x100000000. + */ + if (offset < 0xe0000000) + return self->ram_start + offset; + else + return self->ram_start + 0xe0000000 + (offset - 0x100000000); } static inline void *guest_real_to_host(struct kvm *self, u16 selector, u16 offset) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 65793f2..976b099 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -153,23 +153,33 @@ static bool kvm__cpu_supports_vm(void) return regs.ecx & (1 << feature); } -void kvm__init_ram(struct kvm *self) +static void kvm_register_mem_slot(struct kvm *kvm, u32 slot, u64 guest_phys, u64 size, u64 userspace_addr) { struct kvm_userspace_memory_region mem; int ret; mem = (struct kvm_userspace_memory_region) { - .slot = 0, - .guest_phys_addr = 0x0UL, - .memory_size = self->ram_size, - .userspace_addr = (unsigned long) self->ram_start, + .slot = slot, + .guest_phys_addr = guest_phys, + .memory_size = size, + .userspace_addr = userspace_addr, }; - ret = ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem); + ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem); if (ret < 0) die_perror("KVM_SET_USER_MEMORY_REGION ioctl"); } +void kvm__init_ram(struct kvm *self) +{ + if (self->ram_size < 0xe0000000) { + kvm_register_mem_slot(self, 0, 0, self->ram_size, (u64)self->ram_start); + } else { + kvm_register_mem_slot(self, 0, 0, 0xe0000000, (u64)self->ram_start); + kvm_register_mem_slot(self, 1, 0x100000000ULL, self->ram_size - 0xe0000000, (u64)self->ram_start + 0xe0000000); + } +} + int kvm__max_cpus(struct kvm *self) { int ret;
Add a memory gap between 0xe0000000 and 0x100000000 when using more than 0xe0000000 bytes for guest RAM. This space is used by several things, PCI configuration space for example. This patch updates the e820 table, slot allocations used for KVM_SET_USER_MEMORY_REGION, and the address translation. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> --- tools/kvm/bios.c | 27 +++++++++++++++++++++------ tools/kvm/include/kvm/e820.h | 2 +- tools/kvm/include/kvm/kvm.h | 9 ++++++++- tools/kvm/kvm.c | 22 ++++++++++++++++------ 4 files changed, 46 insertions(+), 14 deletions(-)