diff mbox

[1/3,V4] kvm tools: Add memory gap for larger RAM sizes

Message ID 4dcaa893.5925e30a.0f9e.125c@mx.google.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sasha Levin May 11, 2011, 3:17 p.m. UTC
From: Sasha Levin <levinsasha928@gmail.com>

e820 is expected to leave a memory gap within the low 32
bits of RAM space. From the documentation of e820_setup_gap():
/*
 * Search for the biggest gap in the low 32 bits of the e820
 * memory space.  We pass this space to PCI to assign MMIO resources
 * for hotplug or unconfigured devices in.
 * Hopefully the BIOS let enough space left.
 */

Not leaving such gap causes errors and hangs during the boot
process.

This patch adds a memory gap between 0xe0000000 and 0x100000000
when using more than 0xe0000000 bytes for guest RAM.

This patch updates the e820 table, slot allocations
used for KVM_SET_USER_MEMORY_REGION.

Changes in V2:
 - Allocate RAM with the gap to avoid altering the translation code.
 - New patch description.

Changes in V3:
 - Remove unnecessary casts.

Changes in V4:
 - Rewrite kvm__init_ram().
 - Document the 64bit gap within the code.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 tools/kvm/bios.c             |   27 ++++++++++++----
 tools/kvm/include/kvm/e820.h |    2 +-
 tools/kvm/include/kvm/kvm.h  |    2 +
 tools/kvm/kvm.c              |   66 +++++++++++++++++++++++++++++++++++++----
 4 files changed, 82 insertions(+), 15 deletions(-)

Comments

Ingo Molnar May 11, 2011, 3:30 p.m. UTC | #1
* levinsasha928@gmail.com <levinsasha928@gmail.com> wrote:

> @@ -225,7 +266,18 @@ struct kvm *kvm__init(const char *kvm_dev, unsigned long ram_size)
>  
>  	self->ram_size		= ram_size;
>  
> -	self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +	if (self->ram_size < KVM_32BIT_GAP_START) {
> +		self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +	} else {
> +		self->ram_start = mmap(NULL, ram_size + KVM_32BIT_GAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +		if (self->ram_start != MAP_FAILED) {
> +			/*
> +			 * We mprotect the gap (see kvm__init_ram() for details) PROT_NONE so that
> +			 * if we accidently write to it, we will know.
> +			 */
> +			mprotect(self->ram_start + KVM_32BIT_GAP_START, KVM_32BIT_GAP_SIZE, PROT_NONE);

Nit: the mmaps here wrap off the end of line. It would be a lot more easier to 
read if kvm.h defined two helpers, like:

	#define PROT_RW		(PROT_READ|PROT_WRITE)
	#define MAP_ANON	(MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE)

So the mmaps() would become a bit more readable:

> +	if (self->ram_size < KVM_32BIT_GAP_START) {
> +		self->ram_start = mmap(NULL, ram_size, PROT_RW, MAP_ANON, -1, 0);
> +	} else {
> +		self->ram_start = mmap(NULL, ram_size + KVM_32BIT_GAP_SIZE, PROT_RW, MAP_ANON, -1, 0);
> +		if (self->ram_start != MAP_FAILED) {
> +			/*
> +			 * We mprotect the gap (see kvm__init_ram() for details) PROT_NONE so that
> +			 * if we accidently write to it, we will know.
> +			 */
> +			mprotect(self->ram_start + KVM_32BIT_GAP_START, KVM_32BIT_GAP_SIZE, PROT_NONE);

I'll test your series.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pekka Enberg May 11, 2011, 3:53 p.m. UTC | #2
On Wed, 11 May 2011, Ingo Molnar wrote:
> Ok, with your fixes applied i can boot a 10 GB RAM guest just fine - both the
> hang and the later IO errors and IO corruptions are gone:
>
> [root@aldebaran ~]# free
>              total       used       free     shared    buffers     cached
> Mem:      10019972     150436    9869536          0      79136      17296
> -/+ buffers/cache:      54004    9965968
> Swap:      4096568          0    4096568
>
> Acked-and-tested-by: Ingo Molnar <mingo@elte.hu>

Excellent! Thanks guys!

 			Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Asias He May 12, 2011, 12:49 a.m. UTC | #3
On 05/11/2011 11:17 PM, levinsasha928@gmail.com wrote:
> From: Sasha Levin <levinsasha928@gmail.com>
> 
> e820 is expected to leave a memory gap within the low 32
> bits of RAM space. From the documentation of e820_setup_gap():
> /*
>  * Search for the biggest gap in the low 32 bits of the e820
>  * memory space.  We pass this space to PCI to assign MMIO resources
>  * for hotplug or unconfigured devices in.
>  * Hopefully the BIOS let enough space left.
>  */
> 
> Not leaving such gap causes errors and hangs during the boot
> process.
> 
> This patch adds a memory gap between 0xe0000000 and 0x100000000
> when using more than 0xe0000000 bytes for guest RAM.
> 
> This patch updates the e820 table, slot allocations
> used for KVM_SET_USER_MEMORY_REGION.
> 
> Changes in V2:
>  - Allocate RAM with the gap to avoid altering the translation code.
>  - New patch description.
> 
> Changes in V3:
>  - Remove unnecessary casts.
> 
> Changes in V4:
>  - Rewrite kvm__init_ram().
>  - Document the 64bit gap within the code.
> 
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
>  tools/kvm/bios.c             |   27 ++++++++++++----
>  tools/kvm/include/kvm/e820.h |    2 +-
>  tools/kvm/include/kvm/kvm.h  |    2 +
>  tools/kvm/kvm.c              |   66 +++++++++++++++++++++++++++++++++++++----
>  4 files changed, 82 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/kvm/bios.c b/tools/kvm/bios.c
> index 2199c0c..3cd9b24 100644
> --- a/tools/kvm/bios.c
> +++ b/tools/kvm/bios.c
> @@ -61,8 +61,6 @@ static void e820_setup(struct kvm *kvm)
>  	size		= guest_flat_to_host(kvm, E820_MAP_SIZE);
>  	mem_map		= guest_flat_to_host(kvm, E820_MAP_START);
>  
> -	*size		= E820_MEM_AREAS;
> -
>  	mem_map[i++]	= (struct e820_entry) {
>  		.addr		= REAL_MODE_IVT_BEGIN,
>  		.size		= EBDA_START - REAL_MODE_IVT_BEGIN,
> @@ -78,13 +76,28 @@ static void e820_setup(struct kvm *kvm)
>  		.size		= MB_BIOS_END - MB_BIOS_BEGIN,
>  		.type		= E820_MEM_RESERVED,
>  	};
> -	mem_map[i++]	= (struct e820_entry) {
> -		.addr		= BZ_KERNEL_START,
> -		.size		= kvm->ram_size - BZ_KERNEL_START,
> -		.type		= E820_MEM_USABLE,
> -	};
> +	if (kvm->ram_size < KVM_32BIT_GAP_START) {
> +		mem_map[i++]	= (struct e820_entry) {
> +			.addr		= BZ_KERNEL_START,
> +			.size		= kvm->ram_size - BZ_KERNEL_START,
> +			.type		= E820_MEM_USABLE,
> +		};
> +	} else {
> +		mem_map[i++]	= (struct e820_entry) {
> +			.addr		= BZ_KERNEL_START,
> +			.size		= KVM_32BIT_GAP_START - BZ_KERNEL_START,
> +			.type		= E820_MEM_USABLE,
> +		};
> +		mem_map[i++]	= (struct e820_entry) {
> +			.addr		= 0x100000000ULL,
> +			.size		= kvm->ram_size - KVM_32BIT_GAP_START,
> +			.type		= E820_MEM_USABLE,
> +		};
> +	}
>  
>  	BUILD_BUG_ON(i > E820_MEM_AREAS);
> +
> +	*size			= i;
>  }
>  
>  /**
> diff --git a/tools/kvm/include/kvm/e820.h b/tools/kvm/include/kvm/e820.h
> index 252ae1f..e0f5f2a 100644
> --- a/tools/kvm/include/kvm/e820.h
> +++ b/tools/kvm/include/kvm/e820.h
> @@ -8,7 +8,7 @@
>  #define E820_MEM_USABLE		1
>  #define E820_MEM_RESERVED	2
>  
> -#define E820_MEM_AREAS		4
> +#define E820_MEM_AREAS		5
>  
>  struct e820_entry {
>  	u64	addr;	/* start of memory segment */
> diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
> index 3dab78d..5e2e64c 100644
> --- a/tools/kvm/include/kvm/kvm.h
> +++ b/tools/kvm/include/kvm/kvm.h
> @@ -8,6 +8,8 @@
>  #include <time.h>
>  
>  #define KVM_NR_CPUS		(255)
> +#define KVM_32BIT_GAP_SIZE	(512 << 20)
> +#define KVM_32BIT_GAP_START	((1ULL << 32) - KVM_32BIT_GAP_SIZE)
>  
>  struct kvm {
>  	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
> diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
> index 65793f2..a3d3dd8 100644
> --- a/tools/kvm/kvm.c
> +++ b/tools/kvm/kvm.c
> @@ -153,23 +153,64 @@ static bool kvm__cpu_supports_vm(void)
>  	return regs.ecx & (1 << feature);
>  }
>  
> -void kvm__init_ram(struct kvm *self)
> +static void kvm_register_mem_slot(struct kvm *kvm, u32 slot, u64 guest_phys, u64 size, void *userspace_addr)
>  {
>  	struct kvm_userspace_memory_region mem;
>  	int ret;
>  
>  	mem = (struct kvm_userspace_memory_region) {
> -		.slot			= 0,
> -		.guest_phys_addr	= 0x0UL,
> -		.memory_size		= self->ram_size,
> -		.userspace_addr		= (unsigned long) self->ram_start,
> +		.slot			= slot,
> +		.guest_phys_addr	= guest_phys,
> +		.memory_size		= size,
> +		.userspace_addr		= (u64)userspace_addr,


I am seeing:

  CC       kvm.o
kvm.c: In function ‘kvm_register_mem_slot’:
kvm.c:165:22: error: cast from pointer to integer of different size
[-Werror=pointer-to-int-cast]
cc1: all warnings being treated as errors

make: *** [kvm.o] Error 1

with this patch on 32-bit box.


>  	};
>  
> -	ret = ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
> +	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
>  	if (ret < 0)
>  		die_perror("KVM_SET_USER_MEMORY_REGION ioctl");
>  }
>  
> +/*
> + * Allocating RAM size bigger than 4GB requires us to leave a gap
> + * in the RAM which is used for PCI MMIO, hotplug, and unconfigured
> + * devices (see documentation of e820_setup_gap() for details).
> + *
> + * If we're required to initialize RAM bigger than 4GB, we will create
> + * a gap between 0xe0000000 and 0x100000000 in the guest virtual mem space.
> + */
> +
> +void kvm__init_ram(struct kvm *self)
> +{
> +	u64	phys_start, phys_size;
> +	void	*host_mem;
> +
> +	if (self->ram_size < KVM_32BIT_GAP_START) {
> +		/* Use a single block of RAM for 32bit RAM */
> +
> +		phys_start = 0;
> +		phys_size  = self->ram_size;
> +		host_mem   = self->ram_start;
> +
> +		kvm_register_mem_slot(self, 0, 0, self->ram_size, self->ram_start);
> +	} else {
> +		/* First RAM range from zero to the PCI gap: */
> +
> +		phys_start = 0;
> +		phys_size  = KVM_32BIT_GAP_START;
> +		host_mem   = self->ram_start;
> +
> +		kvm_register_mem_slot(self, 0, phys_start, phys_size, host_mem);
> +
> +		/* Second RAM range from 4GB to the end of RAM: */
> +
> +		phys_start = 0x100000000ULL;
> +		phys_size  = self->ram_size - phys_size;
> +		host_mem   = self->ram_start + phys_start;
> +
> +		kvm_register_mem_slot(self, 1, phys_start, phys_size, host_mem);
> +	}
> +}
> +
>  int kvm__max_cpus(struct kvm *self)
>  {
>  	int ret;
> @@ -225,7 +266,18 @@ struct kvm *kvm__init(const char *kvm_dev, unsigned long ram_size)
>  
>  	self->ram_size		= ram_size;
>  
> -	self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +	if (self->ram_size < KVM_32BIT_GAP_START) {
> +		self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +	} else {
> +		self->ram_start = mmap(NULL, ram_size + KVM_32BIT_GAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
> +		if (self->ram_start != MAP_FAILED) {
> +			/*
> +			 * We mprotect the gap (see kvm__init_ram() for details) PROT_NONE so that
> +			 * if we accidently write to it, we will know.
> +			 */
> +			mprotect(self->ram_start + KVM_32BIT_GAP_START, KVM_32BIT_GAP_SIZE, PROT_NONE);
> +		}
> +	}
>  	if (self->ram_start == MAP_FAILED)
>  		die("out of memory");
>
Ingo Molnar May 12, 2011, 7:34 a.m. UTC | #4
* Asias He <asias.hejun@gmail.com> wrote:

> On 05/11/2011 11:17 PM, levinsasha928@gmail.com wrote:
> > From: Sasha Levin <levinsasha928@gmail.com>
> > 
> > e820 is expected to leave a memory gap within the low 32
> > bits of RAM space. From the documentation of e820_setup_gap():
> > /*
> >  * Search for the biggest gap in the low 32 bits of the e820
> >  * memory space.  We pass this space to PCI to assign MMIO resources
> >  * for hotplug or unconfigured devices in.
> >  * Hopefully the BIOS let enough space left.
> >  */
> > 
> > Not leaving such gap causes errors and hangs during the boot
> > process.
> > 
> > This patch adds a memory gap between 0xe0000000 and 0x100000000
> > when using more than 0xe0000000 bytes for guest RAM.
> > 
> > This patch updates the e820 table, slot allocations
> > used for KVM_SET_USER_MEMORY_REGION.
> > 
> > Changes in V2:
> >  - Allocate RAM with the gap to avoid altering the translation code.
> >  - New patch description.
> > 
> > Changes in V3:
> >  - Remove unnecessary casts.
> > 
> > Changes in V4:
> >  - Rewrite kvm__init_ram().
> >  - Document the 64bit gap within the code.
> > 
> > Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> > ---
> >  tools/kvm/bios.c             |   27 ++++++++++++----
> >  tools/kvm/include/kvm/e820.h |    2 +-
> >  tools/kvm/include/kvm/kvm.h  |    2 +
> >  tools/kvm/kvm.c              |   66 +++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 82 insertions(+), 15 deletions(-)
> > 
> > diff --git a/tools/kvm/bios.c b/tools/kvm/bios.c
> > index 2199c0c..3cd9b24 100644
> > --- a/tools/kvm/bios.c
> > +++ b/tools/kvm/bios.c
> > @@ -61,8 +61,6 @@ static void e820_setup(struct kvm *kvm)
> >  	size		= guest_flat_to_host(kvm, E820_MAP_SIZE);
> >  	mem_map		= guest_flat_to_host(kvm, E820_MAP_START);
> >  
> > -	*size		= E820_MEM_AREAS;
> > -
> >  	mem_map[i++]	= (struct e820_entry) {
> >  		.addr		= REAL_MODE_IVT_BEGIN,
> >  		.size		= EBDA_START - REAL_MODE_IVT_BEGIN,
> > @@ -78,13 +76,28 @@ static void e820_setup(struct kvm *kvm)
> >  		.size		= MB_BIOS_END - MB_BIOS_BEGIN,
> >  		.type		= E820_MEM_RESERVED,
> >  	};
> > -	mem_map[i++]	= (struct e820_entry) {
> > -		.addr		= BZ_KERNEL_START,
> > -		.size		= kvm->ram_size - BZ_KERNEL_START,
> > -		.type		= E820_MEM_USABLE,
> > -	};
> > +	if (kvm->ram_size < KVM_32BIT_GAP_START) {
> > +		mem_map[i++]	= (struct e820_entry) {
> > +			.addr		= BZ_KERNEL_START,
> > +			.size		= kvm->ram_size - BZ_KERNEL_START,
> > +			.type		= E820_MEM_USABLE,
> > +		};
> > +	} else {
> > +		mem_map[i++]	= (struct e820_entry) {
> > +			.addr		= BZ_KERNEL_START,
> > +			.size		= KVM_32BIT_GAP_START - BZ_KERNEL_START,
> > +			.type		= E820_MEM_USABLE,
> > +		};
> > +		mem_map[i++]	= (struct e820_entry) {
> > +			.addr		= 0x100000000ULL,
> > +			.size		= kvm->ram_size - KVM_32BIT_GAP_START,
> > +			.type		= E820_MEM_USABLE,
> > +		};
> > +	}
> >  
> >  	BUILD_BUG_ON(i > E820_MEM_AREAS);
> > +
> > +	*size			= i;
> >  }
> >  
> >  /**
> > diff --git a/tools/kvm/include/kvm/e820.h b/tools/kvm/include/kvm/e820.h
> > index 252ae1f..e0f5f2a 100644
> > --- a/tools/kvm/include/kvm/e820.h
> > +++ b/tools/kvm/include/kvm/e820.h
> > @@ -8,7 +8,7 @@
> >  #define E820_MEM_USABLE		1
> >  #define E820_MEM_RESERVED	2
> >  
> > -#define E820_MEM_AREAS		4
> > +#define E820_MEM_AREAS		5
> >  
> >  struct e820_entry {
> >  	u64	addr;	/* start of memory segment */
> > diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
> > index 3dab78d..5e2e64c 100644
> > --- a/tools/kvm/include/kvm/kvm.h
> > +++ b/tools/kvm/include/kvm/kvm.h
> > @@ -8,6 +8,8 @@
> >  #include <time.h>
> >  
> >  #define KVM_NR_CPUS		(255)
> > +#define KVM_32BIT_GAP_SIZE	(512 << 20)
> > +#define KVM_32BIT_GAP_START	((1ULL << 32) - KVM_32BIT_GAP_SIZE)
> >  
> >  struct kvm {
> >  	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
> > diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
> > index 65793f2..a3d3dd8 100644
> > --- a/tools/kvm/kvm.c
> > +++ b/tools/kvm/kvm.c
> > @@ -153,23 +153,64 @@ static bool kvm__cpu_supports_vm(void)
> >  	return regs.ecx & (1 << feature);
> >  }
> >  
> > -void kvm__init_ram(struct kvm *self)
> > +static void kvm_register_mem_slot(struct kvm *kvm, u32 slot, u64 guest_phys, u64 size, void *userspace_addr)
> >  {
> >  	struct kvm_userspace_memory_region mem;
> >  	int ret;
> >  
> >  	mem = (struct kvm_userspace_memory_region) {
> > -		.slot			= 0,
> > -		.guest_phys_addr	= 0x0UL,
> > -		.memory_size		= self->ram_size,
> > -		.userspace_addr		= (unsigned long) self->ram_start,
> > +		.slot			= slot,
> > +		.guest_phys_addr	= guest_phys,
> > +		.memory_size		= size,
> > +		.userspace_addr		= (u64)userspace_addr,
> 
> 
> I am seeing:
> 
>   CC       kvm.o
> kvm.c: In function ‘kvm_register_mem_slot’:
> kvm.c:165:22: error: cast from pointer to integer of different size
> [-Werror=pointer-to-int-cast]
> cc1: all warnings being treated as errors
> 
> make: *** [kvm.o] Error 1
> 
> with this patch on 32-bit box.

it's useful if you report the 'gcc -v' output for compiler warnings - so that 
we can after some time have a mental picture of which GCC versions produce 
spurious warnings.

Btw,. it would also be useful to add a 'make WERROR=0' option - that way 
warnings that trigger only on some GCC versions can be skipped during the build 
while they will still be reported to us because the default 'make' fails.

[ If you add that then please also send such a patch against tools/perf/ :-) ]

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/tools/kvm/bios.c b/tools/kvm/bios.c
index 2199c0c..3cd9b24 100644
--- a/tools/kvm/bios.c
+++ b/tools/kvm/bios.c
@@ -61,8 +61,6 @@  static void e820_setup(struct kvm *kvm)
 	size		= guest_flat_to_host(kvm, E820_MAP_SIZE);
 	mem_map		= guest_flat_to_host(kvm, E820_MAP_START);
 
-	*size		= E820_MEM_AREAS;
-
 	mem_map[i++]	= (struct e820_entry) {
 		.addr		= REAL_MODE_IVT_BEGIN,
 		.size		= EBDA_START - REAL_MODE_IVT_BEGIN,
@@ -78,13 +76,28 @@  static void e820_setup(struct kvm *kvm)
 		.size		= MB_BIOS_END - MB_BIOS_BEGIN,
 		.type		= E820_MEM_RESERVED,
 	};
-	mem_map[i++]	= (struct e820_entry) {
-		.addr		= BZ_KERNEL_START,
-		.size		= kvm->ram_size - BZ_KERNEL_START,
-		.type		= E820_MEM_USABLE,
-	};
+	if (kvm->ram_size < KVM_32BIT_GAP_START) {
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= BZ_KERNEL_START,
+			.size		= kvm->ram_size - BZ_KERNEL_START,
+			.type		= E820_MEM_USABLE,
+		};
+	} else {
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= BZ_KERNEL_START,
+			.size		= KVM_32BIT_GAP_START - BZ_KERNEL_START,
+			.type		= E820_MEM_USABLE,
+		};
+		mem_map[i++]	= (struct e820_entry) {
+			.addr		= 0x100000000ULL,
+			.size		= kvm->ram_size - KVM_32BIT_GAP_START,
+			.type		= E820_MEM_USABLE,
+		};
+	}
 
 	BUILD_BUG_ON(i > E820_MEM_AREAS);
+
+	*size			= i;
 }
 
 /**
diff --git a/tools/kvm/include/kvm/e820.h b/tools/kvm/include/kvm/e820.h
index 252ae1f..e0f5f2a 100644
--- a/tools/kvm/include/kvm/e820.h
+++ b/tools/kvm/include/kvm/e820.h
@@ -8,7 +8,7 @@ 
 #define E820_MEM_USABLE		1
 #define E820_MEM_RESERVED	2
 
-#define E820_MEM_AREAS		4
+#define E820_MEM_AREAS		5
 
 struct e820_entry {
 	u64	addr;	/* start of memory segment */
diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 3dab78d..5e2e64c 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -8,6 +8,8 @@ 
 #include <time.h>
 
 #define KVM_NR_CPUS		(255)
+#define KVM_32BIT_GAP_SIZE	(512 << 20)
+#define KVM_32BIT_GAP_START	((1ULL << 32) - KVM_32BIT_GAP_SIZE)
 
 struct kvm {
 	int			sys_fd;		/* For system ioctls(), i.e. /dev/kvm */
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 65793f2..a3d3dd8 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -153,23 +153,64 @@  static bool kvm__cpu_supports_vm(void)
 	return regs.ecx & (1 << feature);
 }
 
-void kvm__init_ram(struct kvm *self)
+static void kvm_register_mem_slot(struct kvm *kvm, u32 slot, u64 guest_phys, u64 size, void *userspace_addr)
 {
 	struct kvm_userspace_memory_region mem;
 	int ret;
 
 	mem = (struct kvm_userspace_memory_region) {
-		.slot			= 0,
-		.guest_phys_addr	= 0x0UL,
-		.memory_size		= self->ram_size,
-		.userspace_addr		= (unsigned long) self->ram_start,
+		.slot			= slot,
+		.guest_phys_addr	= guest_phys,
+		.memory_size		= size,
+		.userspace_addr		= (u64)userspace_addr,
 	};
 
-	ret = ioctl(self->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
 	if (ret < 0)
 		die_perror("KVM_SET_USER_MEMORY_REGION ioctl");
 }
 
+/*
+ * Allocating RAM size bigger than 4GB requires us to leave a gap
+ * in the RAM which is used for PCI MMIO, hotplug, and unconfigured
+ * devices (see documentation of e820_setup_gap() for details).
+ *
+ * If we're required to initialize RAM bigger than 4GB, we will create
+ * a gap between 0xe0000000 and 0x100000000 in the guest virtual mem space.
+ */
+
+void kvm__init_ram(struct kvm *self)
+{
+	u64	phys_start, phys_size;
+	void	*host_mem;
+
+	if (self->ram_size < KVM_32BIT_GAP_START) {
+		/* Use a single block of RAM for 32bit RAM */
+
+		phys_start = 0;
+		phys_size  = self->ram_size;
+		host_mem   = self->ram_start;
+
+		kvm_register_mem_slot(self, 0, 0, self->ram_size, self->ram_start);
+	} else {
+		/* First RAM range from zero to the PCI gap: */
+
+		phys_start = 0;
+		phys_size  = KVM_32BIT_GAP_START;
+		host_mem   = self->ram_start;
+
+		kvm_register_mem_slot(self, 0, phys_start, phys_size, host_mem);
+
+		/* Second RAM range from 4GB to the end of RAM: */
+
+		phys_start = 0x100000000ULL;
+		phys_size  = self->ram_size - phys_size;
+		host_mem   = self->ram_start + phys_start;
+
+		kvm_register_mem_slot(self, 1, phys_start, phys_size, host_mem);
+	}
+}
+
 int kvm__max_cpus(struct kvm *self)
 {
 	int ret;
@@ -225,7 +266,18 @@  struct kvm *kvm__init(const char *kvm_dev, unsigned long ram_size)
 
 	self->ram_size		= ram_size;
 
-	self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
+	if (self->ram_size < KVM_32BIT_GAP_START) {
+		self->ram_start = mmap(NULL, ram_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
+	} else {
+		self->ram_start = mmap(NULL, ram_size + KVM_32BIT_GAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
+		if (self->ram_start != MAP_FAILED) {
+			/*
+			 * We mprotect the gap (see kvm__init_ram() for details) PROT_NONE so that
+			 * if we accidently write to it, we will know.
+			 */
+			mprotect(self->ram_start + KVM_32BIT_GAP_START, KVM_32BIT_GAP_SIZE, PROT_NONE);
+		}
+	}
 	if (self->ram_start == MAP_FAILED)
 		die("out of memory");