diff mbox

[1/2] kvm tools: Add memory gap for larger RAM sizes

Message ID 1305061513-28360-1-git-send-email-levinsasha928@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sasha Levin May 10, 2011, 9:05 p.m. UTC
Add a memory gap between 0xe0000000 and 0x100000000
when using more than 0xe0000000 bytes for guest RAM.

This space is used by several things, PCI configuration
space for example.

This patch updates the e820 table, slot allocations
used for KVM_SET_USER_MEMORY_REGION, and the address
translation.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 tools/kvm/bios.c             |   27 +++++++++++++++++++++------
 tools/kvm/include/kvm/e820.h |    2 +-
 tools/kvm/include/kvm/kvm.h  |    9 ++++++++-
 tools/kvm/kvm.c              |   22 ++++++++++++++++------
 4 files changed, 46 insertions(+), 14 deletions(-)

Comments

Pekka Enberg May 11, 2011, 5:37 a.m. UTC | #1
On Wed, May 11, 2011 at 12:05 AM, Sasha Levin <levinsasha928@gmail.com> wrote:
> +       if (kvm->ram_size < 0xe0000000) {

Please use the ULL postfix for constants to ensure the types are sane.
Also, please come up with a sane name for these.

> @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset)
>
>  static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset)
>  {
> -       return self->ram_start + offset;
> +       /*
> +        * We have a gap between 0xe0000000 and 0x100000000.
> +        * Consider it when translating an address above 0x100000000.
> +        */
> +       if (offset < 0xe0000000)
> +               return self->ram_start + offset;
> +       else
> +               return self->ram_start + 0xe0000000 + (offset - 0x100000000);
>  }

Would it not be simpler to mmap() a "ram_size + gap_size" contiguous
region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820
maps about two separate regions but guest_flat_to_host() would
work-as-is.

                        Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin May 11, 2011, 6:21 a.m. UTC | #2
On Wed, 2011-05-11 at 08:37 +0300, Pekka Enberg wrote:
> On Wed, May 11, 2011 at 12:05 AM, Sasha Levin <levinsasha928@gmail.com> wrote:
> > +       if (kvm->ram_size < 0xe0000000) {
> 
> Please use the ULL postfix for constants to ensure the types are sane.
> Also, please come up with a sane name for these.
> 
> > @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset)
> >
> >  static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset)
> >  {
> > -       return self->ram_start + offset;
> > +       /*
> > +        * We have a gap between 0xe0000000 and 0x100000000.
> > +        * Consider it when translating an address above 0x100000000.
> > +        */
> > +       if (offset < 0xe0000000)
> > +               return self->ram_start + offset;
> > +       else
> > +               return self->ram_start + 0xe0000000 + (offset - 0x100000000);
> >  }
> 
> Would it not be simpler to mmap() a "ram_size + gap_size" contiguous
> region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820
> maps about two separate regions but guest_flat_to_host() would
> work-as-is.

I've wanted to avoid actually allocating that gap (which is currently
512MB) and instead take the hit in guest_flat_to_host().

If you feel the 512MB vs guest_flat_to_host() trade-off is worth it,
I'll change it to work that way.
Pekka Enberg May 11, 2011, 6:26 a.m. UTC | #3
On 5/11/11 9:21 AM, Sasha Levin wrote:
> If you feel the 512MB vs guest_flat_to_host() trade-off is worth it,
> I'll change it to work that way.

Why would it not be? This is 64-bit only, right? There's plenty of 
virtual address
space and mprotect() should make sure we never allocate physical pages 
for it.
Sure, there's some in-kernel overhead involved as well, but that's 
extremely small.

I'm not worried about performance in guest_flat_to_host() but I think 
the current
implementation is not very clean. If you want to mmap() two separate 
regions,
we should have our own internal "memory map" that's used for this (and for
populating KVM end E820 maps).

So I think mmap'ing the gap is the cleanest solution for now. We can 
revisit the
decision if we need even more regions in the future.

                     Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar May 11, 2011, 7:10 a.m. UTC | #4
* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On 5/11/11 9:21 AM, Sasha Levin wrote:
> >If you feel the 512MB vs guest_flat_to_host() trade-off is worth it,
> >I'll change it to work that way.
> 
> Why would it not be? This is 64-bit only, right? There's plenty of virtual 
> address space and mprotect() should make sure we never allocate physical 
> pages for it. Sure, there's some in-kernel overhead involved as well, but 
> that's extremely small.
> 
> I'm not worried about performance in guest_flat_to_host() but I think the 
> current implementation is not very clean. If you want to mmap() two separate 
> regions, we should have our own internal "memory map" that's used for this 
> (and for populating KVM end E820 maps).
> 
> So I think mmap'ing the gap is the cleanest solution for now. We can revisit 
> the decision if we need even more regions in the future.

Agreed.

There's also admittedly somewhat of a conceptual beauty in having a linearly 
addressable chunk of *all* guest physical RAM on the hypervisor side.

Virtualization involves so many indirections to begin with that keeping the 
mental picture simpler is helpful IMHO ...

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 11, 2011, 8:30 a.m. UTC | #5
On 05/11/2011 08:37 AM, Pekka Enberg wrote:
> On Wed, May 11, 2011 at 12:05 AM, Sasha Levin<levinsasha928@gmail.com>  wrote:
> >  +       if (kvm->ram_size<  0xe0000000) {
>
> Please use the ULL postfix for constants to ensure the types are sane.
> Also, please come up with a sane name for these.
>
> >  @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset)
> >
> >    static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset)
> >    {
> >  -       return self->ram_start + offset;
> >  +       /*
> >  +        * We have a gap between 0xe0000000 and 0x100000000.
> >  +        * Consider it when translating an address above 0x100000000.
> >  +        */
> >  +       if (offset<  0xe0000000)
> >  +               return self->ram_start + offset;
> >  +       else
> >  +               return self->ram_start + 0xe0000000 + (offset - 0x100000000);
> >    }
>
> Would it not be simpler to mmap() a "ram_size + gap_size" contiguous
> region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820
> maps about two separate regions but guest_flat_to_host() would
> work-as-is.

It doesn't work in general - if you have a PCI device with a BAR (like a 
video card framebuffer), then you need allocations for main memory (0+) 
and pci (0xe000000+).  You can't have a contiguous mapping on i386 
containing both.
Ingo Molnar May 11, 2011, 8:44 a.m. UTC | #6
* Avi Kivity <avi@redhat.com> wrote:

> On 05/11/2011 08:37 AM, Pekka Enberg wrote:
> >On Wed, May 11, 2011 at 12:05 AM, Sasha Levin<levinsasha928@gmail.com>  wrote:
> >>  +       if (kvm->ram_size<  0xe0000000) {
> >
> >Please use the ULL postfix for constants to ensure the types are sane.
> >Also, please come up with a sane name for these.
> >
> >>  @@ -60,7 +60,14 @@ static inline u32 segment_to_flat(u16 selector, u16 offset)
> >>
> >>    static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset)
> >>    {
> >>  -       return self->ram_start + offset;
> >>  +       /*
> >>  +        * We have a gap between 0xe0000000 and 0x100000000.
> >>  +        * Consider it when translating an address above 0x100000000.
> >>  +        */
> >>  +       if (offset<  0xe0000000)
> >>  +               return self->ram_start + offset;
> >>  +       else
> >>  +               return self->ram_start + 0xe0000000 + (offset - 0x100000000);
> >>    }
> >
> >Would it not be simpler to mmap() a "ram_size + gap_size" contiguous
> >region and mprotect(PROT_NONE) the gap? We'd still tell KVM and E820
> >maps about two separate regions but guest_flat_to_host() would
> >work-as-is.
> 
> It doesn't work in general - if you have a PCI device with a BAR
> (like a video card framebuffer), then you need allocations for main
> memory (0+) and pci (0xe000000+).  You can't have a contiguous
> mapping on i386 containing both.

I think in tools/kvm/ we can ignore i386 hosts that would like to map more RAM 
than they have virtual address space for ...

Guests up to 1-2 gigs of RAM will still work fine.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 11, 2011, 9:06 a.m. UTC | #7
On 05/11/2011 11:44 AM, Ingo Molnar wrote:
> >
> >  It doesn't work in general - if you have a PCI device with a BAR
> >  (like a video card framebuffer), then you need allocations for main
> >  memory (0+) and pci (0xe000000+).  You can't have a contiguous
> >  mapping on i386 containing both.
>
> I think in tools/kvm/ we can ignore i386 hosts that would like to map more RAM
> than they have virtual address space for ...
>
> Guests up to 1-2 gigs of RAM will still work fine.
>

It doesn't work.  Consider a guest with 128MB of RAM mapped at 0-128MB 
and an 8MB framebuffer mapped at 0xe0000000.  There's no way to present 
this in a contiguous space.

Well, I guess you can map the framebuffer lower, but that means giving 
up memory hotplug if you ever wish to implement it (and doesn't allow 
the guest to remap the framebuffer if it wishes to).
Pekka Enberg May 11, 2011, 9:28 a.m. UTC | #8
On 5/11/11 12:06 PM, Avi Kivity wrote:
> Well, I guess you can map the framebuffer lower, but that means giving 
> up memory hotplug if you ever wish to implement it (and doesn't allow 
> the guest to remap the framebuffer if it wishes to).

True. As I said, we need to do it properly at some point. But as long as 
it's only about the PCI hole for > 4GB guests, lets mmap() the whole 
range instead.

                         Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tools/kvm/bios.c b/tools/kvm/bios.c
index 2199c0c..cd417fa 100644
--- a/tools/kvm/bios.c
+++ b/tools/kvm/bios.c
@@ -61,7 +61,7 @@  static void e820_setup(struct kvm *kvm)
 	size		= guest_flat_to_host(kvm, E820_MAP_SIZE);
 	mem_map		= guest_flat_to_host(kvm, E820_MAP_START);
 
-	*size		= E820_MEM_AREAS;
+
 
 	mem_map[i++]	= (struct e820_entry) {
 		.addr		= REAL_MODE_IVT_BEGIN,
@@ -78,13 +78,28 @@  static void e820_setup(struct kvm *kvm)
 		.size		= MB_BIOS_END - MB_BIOS_BEGIN,
 		.type		= E820_MEM_RESERVED,
 	};
-	mem_map[i++]	= (struct e820_entry) {
-		.addr		= BZ_KERNEL_START,
-		.size		= kvm->ram_size - BZ_KERNEL_START,
-		.type		= E820_MEM_USABLE,
-	};
+	if (kvm->ram_size < 0xe0000000) {
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= BZ_KERNEL_START,
+			.size		= kvm->ram_size - BZ_KERNEL_START,
+			.type		= E820_MEM_USABLE,
+		};
+	} else {
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= BZ_KERNEL_START,
+			.size		= 0xe0000000 - BZ_KERNEL_START,
+			.type		= E820_MEM_USABLE,
+		};
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= 0x100000000ULL,
+			.size		= kvm->ram_size - 0xe0000000 - BZ_KERNEL_START,
+			.type		= E820_MEM_USABLE,
+		};
+	}
 
 	BUILD_BUG_ON(i > E820_MEM_AREAS);
+
+	*size			= i;
 }
 
 /**
diff --git a/tools/kvm/include/kvm/e820.h b/tools/kvm/include/kvm/e820.h
index 252ae1f..e0f5f2a 100644
--- a/tools/kvm/include/kvm/e820.h
+++ b/tools/kvm/include/kvm/e820.h
@@ -8,7 +8,7 @@ 
 #define E820_MEM_USABLE		1
 #define E820_MEM_RESERVED	2
 
-#define E820_MEM_AREAS		4
+#define E820_MEM_AREAS		5
 
 struct e820_entry {
 	u64	addr;	/* start of memory segment */
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 3dab78d..e9c16ea 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -60,7 +60,14 @@  static inline u32 segment_to_flat(u16 selector, u16 offset)
 
 static inline void *guest_flat_to_host(struct kvm *self, unsigned long offset)
 {
-	return self->ram_start + offset;
+	/*
+	 * We have a gap between 0xe0000000 and 0x100000000.
+	 * Consider it when translating an address above 0x100000000.
+	 */
+	if (offset < 0xe0000000)
+		return self->ram_start + offset;
+	else
+		return self->ram_start + 0xe0000000 + (offset - 0x100000000);
 }
 
 static inline void *guest_real_to_host(struct kvm *self, u16 selector, u16 offset)
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 65793f2..976b099 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -153,23 +153,33 @@  static bool kvm__cpu_supports_vm(void)
 	return regs.ecx & (1 << feature);
 }
 
-void kvm__init_ram(struct kvm *self)
+static void kvm_register_mem_slot(struct kvm *kvm, u32 slot, u64 guest_phys, u64 size, u64 userspace_addr)
 {
 	struct kvm_userspace_memory_region mem;
 	int ret;
 
 	mem = (struct kvm_userspace_memory_region) {
-		.slot			= 0,
-		.guest_phys_addr	= 0x0UL,
-		.memory_size		= self->ram_size,
-		.userspace_addr		= (unsigned long) self->ram_start,
+		.slot			= slot,
+		.guest_phys_addr	= guest_phys,
+		.memory_size		= size,
+		.userspace_addr		= userspace_addr,
 	};
 
-	ret = ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
 	if (ret < 0)
 		die_perror("KVM_SET_USER_MEMORY_REGION ioctl");
 }
 
+void kvm__init_ram(struct kvm *self)
+{
+	if (self->ram_size < 0xe0000000) {
+		kvm_register_mem_slot(self, 0, 0, self->ram_size, (u64)self->ram_start);
+	} else {
+		kvm_register_mem_slot(self, 0, 0, 0xe0000000, (u64)self->ram_start);
+		kvm_register_mem_slot(self, 1, 0x100000000ULL, self->ram_size - 0xe0000000, (u64)self->ram_start + 0xe0000000);
+	}
+}
+
 int kvm__max_cpus(struct kvm *self)
 {
 	int ret;