diff mbox series

[v3,14/22] microvm: use 2G split unconditionally

Message ID 20200520132003.9492-15-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show
Series microvm: add acpi support | expand

Commit Message

Gerd Hoffmann May 20, 2020, 1:19 p.m. UTC
Looks like the logiv was copied over from q35.

q35 does this for backward compatibility, there is no reason to do this
on microvm though.  So split @ 2G unconditionally.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/i386/microvm.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

Comments

Philippe Mathieu-Daudé May 20, 2020, 1:34 p.m. UTC | #1
On 5/20/20 3:19 PM, Gerd Hoffmann wrote:
> Looks like the logiv was copied over from q35.

Typo 'logiv' -> 'logic'.

> 
> q35 does this for backward compatibility, there is no reason to do this
> on microvm though.  So split @ 2G unconditionally.

Yes please!

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>   hw/i386/microvm.c | 16 +---------------
>   1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 867d3d652145..b8f0d3283758 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -170,23 +170,9 @@ static void microvm_memory_init(MicrovmMachineState *mms)
>       MemoryRegion *ram_below_4g, *ram_above_4g;
>       MemoryRegion *system_memory = get_system_memory();
>       FWCfgState *fw_cfg;
> -    ram_addr_t lowmem;
> +    ram_addr_t lowmem = 0x80000000; /* 2G */
>       int i;
>   
> -    /*
> -     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> -     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> -     * also known as MMCFG).
> -     * If it doesn't, we need to split it in chunks below and above 4G.
> -     * In any case, try to make sure that guest addresses aligned at
> -     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> -     */
> -    if (machine->ram_size >= 0xb0000000) {
> -        lowmem = 0x80000000;
> -    } else {
> -        lowmem = 0xb0000000;
> -    }
> -
>       /*
>        * Handle the machine opt max-ram-below-4g.  It is basically doing
>        * min(qemu limit, user limit).
>
Igor Mammedov May 21, 2020, 9:14 a.m. UTC | #2
On Wed, 20 May 2020 15:19:55 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

> Looks like the logiv was copied over from q35.
> 
> q35 does this for backward compatibility, there is no reason to do this
> on microvm though.  So split @ 2G unconditionally.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/i386/microvm.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 867d3d652145..b8f0d3283758 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -170,23 +170,9 @@ static void microvm_memory_init(MicrovmMachineState *mms)
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      MemoryRegion *system_memory = get_system_memory();
>      FWCfgState *fw_cfg;
> -    ram_addr_t lowmem;
> +    ram_addr_t lowmem = 0x80000000; /* 2G */
>      int i;
>  
> -    /*
> -     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> -     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> -     * also known as MMCFG).
> -     * If it doesn't, we need to split it in chunks below and above 4G.
> -     * In any case, try to make sure that guest addresses aligned at
> -     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> -     */
> -    if (machine->ram_size >= 0xb0000000) {
> -        lowmem = 0x80000000;
> -    } else {
> -        lowmem = 0xb0000000;
> -    }
> -
>      /*
>       * Handle the machine opt max-ram-below-4g.  It is basically doing
>       * min(qemu limit, user limit).
Igor Mammedov May 21, 2020, 9:29 a.m. UTC | #3
On Wed, 20 May 2020 15:19:55 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

> Looks like the logiv was copied over from q35.
> 
> q35 does this for backward compatibility, there is no reason to do this
> on microvm though.  So split @ 2G unconditionally.

not related to your ACPI rework, but just an idea for future of microvm

I wonder if we should carry over all this fixed RAM layout legacy from pc/q35
with a bunch of knobs to tweak it (along with complicated logic).

Can we just re-use pc-dimms for main RAM and let user specify RAM layout the way they wish?


> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/i386/microvm.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 867d3d652145..b8f0d3283758 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -170,23 +170,9 @@ static void microvm_memory_init(MicrovmMachineState *mms)
>      MemoryRegion *ram_below_4g, *ram_above_4g;
>      MemoryRegion *system_memory = get_system_memory();
>      FWCfgState *fw_cfg;
> -    ram_addr_t lowmem;
> +    ram_addr_t lowmem = 0x80000000; /* 2G */
>      int i;
>  
> -    /*
> -     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
> -     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
> -     * also known as MMCFG).
> -     * If it doesn't, we need to split it in chunks below and above 4G.
> -     * In any case, try to make sure that guest addresses aligned at
> -     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
> -     */
> -    if (machine->ram_size >= 0xb0000000) {
> -        lowmem = 0x80000000;
> -    } else {
> -        lowmem = 0xb0000000;
> -    }
> -
>      /*
>       * Handle the machine opt max-ram-below-4g.  It is basically doing
>       * min(qemu limit, user limit).
Gerd Hoffmann May 25, 2020, 11:45 a.m. UTC | #4
On Thu, May 21, 2020 at 11:29:21AM +0200, Igor Mammedov wrote:
> On Wed, 20 May 2020 15:19:55 +0200
> Gerd Hoffmann <kraxel@redhat.com> wrote:
> 
> > Looks like the logiv was copied over from q35.
> > 
> > q35 does this for backward compatibility, there is no reason to do this
> > on microvm though.  So split @ 2G unconditionally.
> 
> not related to your ACPI rework, but just an idea for future of microvm
> 
> I wonder if we should carry over all this fixed RAM layout legacy from pc/q35
> with a bunch of knobs to tweak it (along with complicated logic).

Well, I think we can (should) drop max-ram-below-4g too.  There is
no reason to use that with microvm, other that shooting yourself into
the foot (by making mmio overlap with ram).

With that being gone too there isn't much logic left ...

take care,
  Gerd
Igor Mammedov May 25, 2020, 4:36 p.m. UTC | #5
On Mon, 25 May 2020 13:45:08 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

> On Thu, May 21, 2020 at 11:29:21AM +0200, Igor Mammedov wrote:
> > On Wed, 20 May 2020 15:19:55 +0200
> > Gerd Hoffmann <kraxel@redhat.com> wrote:
> >   
> > > Looks like the logiv was copied over from q35.
> > > 
> > > q35 does this for backward compatibility, there is no reason to do this
> > > on microvm though.  So split @ 2G unconditionally.  
> > 
> > not related to your ACPI rework, but just an idea for future of microvm
> > 
> > I wonder if we should carry over all this fixed RAM layout legacy from pc/q35
> > with a bunch of knobs to tweak it (along with complicated logic).  
> 
> Well, I think we can (should) drop max-ram-below-4g too.  There is
> no reason to use that with microvm, other that shooting yourself into
> the foot (by making mmio overlap with ram).
> 
> With that being gone too there isn't much logic left ...

I wonder if we need 2G split for microvm at all?
Can we map 1 contiguous big blob from 0 GPA and overlay bios & other x86 TOLUD stuff?


> take care,
>   Gerd
>
Gerd Hoffmann May 26, 2020, 4:48 a.m. UTC | #6
> > Well, I think we can (should) drop max-ram-below-4g too.  There is
> > no reason to use that with microvm, other that shooting yourself into
> > the foot (by making mmio overlap with ram).
> > 
> > With that being gone too there isn't much logic left ...
> 
> I wonder if we need 2G split for microvm at all?
> Can we map 1 contiguous big blob from 0 GPA and overlay bios & other x86 TOLUD stuff?

I think it would work, but it has some drawbacks:

  (1) we loose a bit of memory.
  (2) we loose a gigabyte page.
  (3) we wouldn't have guard pages (unused address space) between
      between ram and mmio space.

take care,
  Gerd
Igor Mammedov May 27, 2020, 12:25 p.m. UTC | #7
On Tue, 26 May 2020 06:48:39 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

> > > Well, I think we can (should) drop max-ram-below-4g too.  There is
> > > no reason to use that with microvm, other that shooting yourself into
> > > the foot (by making mmio overlap with ram).
> > > 
> > > With that being gone too there isn't much logic left ...  
> > 
> > I wonder if we need 2G split for microvm at all?
> > Can we map 1 contiguous big blob from 0 GPA and overlay bios & other x86 TOLUD stuff?  
> 
> I think it would work, but it has some drawbacks:
> 
>   (1) we loose a bit of memory.
          it's probably not a big enough to care about, we do similar ovarlay mapping on pc/q35
          at the beginning of RAM
>   (2) we loose a gigabyte page.
          I'm not sure waht exactly we loose in this case.
          Lets assume we allocating guest 5G of continuous RAM using 1G huge pages,
          in this case I'd think that on host side MMIO overlay won't affect RAM blob
          on guest side pagetables will be fragmented due to MMIO holes, but guest still
          could use huge pages smaller ones in fragmented area and 1G where there is no fragmentation.

>   (3) we wouldn't have guard pages (unused address space) between
>       between ram and mmio space.
           if it's holes' mmio,then do we really need them (access is going to be terminated
           either in always valid RAM or in valid mmio hole)?
> 
> take care,
>   Gerd
> 
>
Paolo Bonzini May 27, 2020, 1:06 p.m. UTC | #8
On 27/05/20 14:25, Igor Mammedov wrote:
>>   (2) we loose a gigabyte page.
>           I'm not sure waht exactly we loose in this case.
>           Lets assume we allocating guest 5G of continuous RAM using 1G huge pages,
>           in this case I'd think that on host side MMIO overlay won't affect RAM blob
>           on guest side pagetables will be fragmented due to MMIO holes, but guest still
>           could use huge pages smaller ones in fragmented area and 1G where there is no fragmentation.

Access to the 3G-4G area would not be able to use 1G EPT pages.

But why use 2G split instead of 3G?  There's only very little MMIO and
no PCI hole (including no huge MMCONFIG BAR) on microvm.

Paolo
Igor Mammedov May 27, 2020, 2:26 p.m. UTC | #9
On Wed, 27 May 2020 15:06:28 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 27/05/20 14:25, Igor Mammedov wrote:
> >>   (2) we loose a gigabyte page.  
> >           I'm not sure waht exactly we loose in this case.
> >           Lets assume we allocating guest 5G of continuous RAM using 1G huge pages,
> >           in this case I'd think that on host side MMIO overlay won't affect RAM blob
> >           on guest side pagetables will be fragmented due to MMIO holes, but guest still
> >           could use huge pages smaller ones in fragmented area and 1G where there is no fragmentation.  
> 
> Access to the 3G-4G area would not be able to use 1G EPT pages.
Could it use 2Mb pages instead of 1Gb?
Do we really care about 1 gigabyte huge page in microvm intended usecase?
(fast starting VMs for microservices like FaaS, which unlikely would use much memory to begin with)

> But why use 2G split instead of 3G?  There's only very little MMIO and
> no PCI hole (including no huge MMCONFIG BAR) on microvm.
> 
> Paolo
>
Igor Mammedov May 27, 2020, 2:35 p.m. UTC | #10
On Wed, 27 May 2020 16:26:46 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 27 May 2020 15:06:28 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> > On 27/05/20 14:25, Igor Mammedov wrote:  
> > >>   (2) we loose a gigabyte page.    
> > >           I'm not sure waht exactly we loose in this case.
> > >           Lets assume we allocating guest 5G of continuous RAM using 1G huge pages,
> > >           in this case I'd think that on host side MMIO overlay won't affect RAM blob
> > >           on guest side pagetables will be fragmented due to MMIO holes, but guest still
> > >           could use huge pages smaller ones in fragmented area and 1G where there is no fragmentation.    
> > 
> > Access to the 3G-4G area would not be able to use 1G EPT pages.  
> Could it use 2Mb pages instead of 1Gb?
> Do we really care about 1 gigabyte huge page in microvm intended usecase?
> (fast starting VMs for microservices like FaaS, which unlikely would use much memory to begin with)

my interest in having single memory region, is in possibility of drop in conversion to [nv|pc-dimm] later on
without breaking ABI. (I'm not sure that we actually need it though)


> > But why use 2G split instead of 3G?  There's only very little MMIO and
> > no PCI hole (including no huge MMCONFIG BAR) on microvm.
> > 
> > Paolo
> >   
> 
>
Paolo Bonzini May 27, 2020, 2:54 p.m. UTC | #11
On 27/05/20 16:26, Igor Mammedov wrote:
> On Wed, 27 May 2020 15:06:28 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 27/05/20 14:25, Igor Mammedov wrote:
>>>>   (2) we loose a gigabyte page.  
>>>           I'm not sure waht exactly we loose in this case.
>>>           Lets assume we allocating guest 5G of continuous RAM using 1G huge pages,
>>>           in this case I'd think that on host side MMIO overlay won't affect RAM blob
>>>           on guest side pagetables will be fragmented due to MMIO holes, but guest still
>>>           could use huge pages smaller ones in fragmented area and 1G where there is no fragmentation.  
>>
>> Access to the 3G-4G area would not be able to use 1G EPT pages.
> Could it use 2Mb pages instead of 1Gb?

Yes, probably a mix of 2 MiB pages and 4 KiB pages around the memslot
splits.

> Do we really care about 1 gigabyte huge page in microvm intended usecase?
> (fast starting VMs for microservices like FaaS, which unlikely would use much memory to begin with)

I honestly don't think it's measurable, but at least in theory we care
because such workloads could have more TLB misses (relative to the
execution time) than long-lasting VMs.

Paolo
Gerd Hoffmann May 28, 2020, 7:17 a.m. UTC | #12
Hi,

> >   (1) we loose a bit of memory.
>           it's probably not a big enough to care about, we do similar ovarlay mapping on pc/q35
>           at the beginning of RAM

Yes, shouldn't be too much.

> >   (2) we loose a gigabyte page.
>           I'm not sure waht exactly we loose in this case.

The 1G page for 0xc0000000 -> 0xffffffff (as explained by paolo).

> >   (3) we wouldn't have guard pages (unused address space) between
> >       between ram and mmio space.
>            if it's holes' mmio,then do we really need them (access is going to be terminated
>            either in always valid RAM or in valid mmio hole)?

Not required, but more robust.  Less likely that the guest touches mmio
by accident.

I'd expect it also requires some e820 hacks.

cheers,
  Gerd
Gerd Hoffmann May 28, 2020, 7:19 a.m. UTC | #13
Hi,

> But why use 2G split instead of 3G?  There's only very little MMIO and
> no PCI hole (including no huge MMCONFIG BAR) on microvm.

Yes, we can go for 3G, we are indeed not short on address space ;)

take care,
  Gerd
diff mbox series

Patch

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 867d3d652145..b8f0d3283758 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -170,23 +170,9 @@  static void microvm_memory_init(MicrovmMachineState *mms)
     MemoryRegion *ram_below_4g, *ram_above_4g;
     MemoryRegion *system_memory = get_system_memory();
     FWCfgState *fw_cfg;
-    ram_addr_t lowmem;
+    ram_addr_t lowmem = 0x80000000; /* 2G */
     int i;
 
-    /*
-     * Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
-     * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
-     * also known as MMCFG).
-     * If it doesn't, we need to split it in chunks below and above 4G.
-     * In any case, try to make sure that guest addresses aligned at
-     * 1G boundaries get mapped to host addresses aligned at 1G boundaries.
-     */
-    if (machine->ram_size >= 0xb0000000) {
-        lowmem = 0x80000000;
-    } else {
-        lowmem = 0xb0000000;
-    }
-
     /*
      * Handle the machine opt max-ram-below-4g.  It is basically doing
      * min(qemu limit, user limit).