diff mbox

[RFC,2/4] spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM

Message ID 1475722987-18644-3-git-send-email-david@gibson.dropbear.id.au (mailing list archive)
State New, archived
Headers show

Commit Message

David Gibson Oct. 6, 2016, 3:03 a.m. UTC
Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space.  This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Laurent Vivier Oct. 6, 2016, 7:21 a.m. UTC | #1
On 06/10/2016 05:03, David Gibson wrote:
> Currently the default PCI host bridge for the 'pseries' machine type is
> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> guest memory space.  This means that if > 1TiB of guest RAM is specified,
> the RAM will collide with the PCI IO windows, causing serious problems.
> 
> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> there's a little unused space at the bottom of the area reserved for PCI,
> but essentially this means that > 1TiB of RAM has never worked with the
> pseries machine type.
> 
> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> Instead of always placing the first PHB at 1TiB, it is placed at the next
> 1 TiB boundary after the maximum RAM address.
> 
> Technically, this changes behaviour in a migration-breaking way for
> existing machines with > 1TiB maximum memory, but since having > 1 TiB
> memory was broken anyway, this seems like a reasonable trade-off.

Perhaps you can add an SPAPR_COMPAT_XX property with PHB0 base to not
break compatibility? I think spapr without PCI card (only VIO, for
instance), should work with 1TiB+.

On another side, how the SPAPR kernel manages memory?
Is it possible to add an hole in RAM between 1TiB and 1TiB+64GiB to
allow the kernel to register the I/O space?

Laurent
David Gibson Oct. 6, 2016, 8:46 a.m. UTC | #2
On Thu, Oct 06, 2016 at 09:21:56AM +0200, Laurent Vivier wrote:
> 
> 
> On 06/10/2016 05:03, David Gibson wrote:
> > Currently the default PCI host bridge for the 'pseries' machine type is
> > constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> > guest memory space.  This means that if > 1TiB of guest RAM is specified,
> > the RAM will collide with the PCI IO windows, causing serious problems.
> > 
> > Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> > there's a little unused space at the bottom of the area reserved for PCI,
> > but essentially this means that > 1TiB of RAM has never worked with the
> > pseries machine type.
> > 
> > This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> > Instead of always placing the first PHB at 1TiB, it is placed at the next
> > 1 TiB boundary after the maximum RAM address.
> > 
> > Technically, this changes behaviour in a migration-breaking way for
> > existing machines with > 1TiB maximum memory, but since having > 1 TiB
> > memory was broken anyway, this seems like a reasonable trade-off.
> 
> Perhaps you can add an SPAPR_COMPAT_XX property with PHB0 base to not
> break compatibility? I think spapr without PCI card (only VIO, for
> instance), should work with 1TiB+.

No, it won't because we always create the default PHB even if we don't
actually have any PCI devices.

> On another side, how the SPAPR kernel manages memory?
> Is it possible to add an hole in RAM between 1TiB and 1TiB+64GiB to
> allow the kernel to register the I/O space?

Possible, yes, but I don't really see any advantage to it.
Laurent Vivier Oct. 6, 2016, 9:36 a.m. UTC | #3
On 06/10/2016 05:03, David Gibson wrote:
> Currently the default PCI host bridge for the 'pseries' machine type is
> constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
> guest memory space.  This means that if > 1TiB of guest RAM is specified,
> the RAM will collide with the PCI IO windows, causing serious problems.
> 
> Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
> there's a little unused space at the bottom of the area reserved for PCI,
> but essentially this means that > 1TiB of RAM has never worked with the
> pseries machine type.
> 
> This patch fixes this by altering the placement of PHBs on large-RAM VMs.
> Instead of always placing the first PHB at 1TiB, it is placed at the next
> 1 TiB boundary after the maximum RAM address.
> 
> Technically, this changes behaviour in a migration-breaking way for
> existing machines with > 1TiB maximum memory, but since having > 1 TiB
> memory was broken anyway, this seems like a reasonable trade-off.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Reviewed-by: Laurent Vivier <lvivier@redhat.com>

> ---
>  hw/ppc/spapr.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f6e9c2a..9f3e004 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2376,12 +2376,15 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
>                                  unsigned n_dma, uint32_t *liobns, Error **errp)
>  {
>      const uint64_t base_buid = 0x800000020000000ULL;
> -    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
>      const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
>      const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
>      const hwaddr pio_offset = 0x80000000; /* 2 GiB */
>      const uint32_t max_index = 255;
> +    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
>  
> +    uint64_t max_hotplug_addr = spapr->hotplug_memory.base +
> +        memory_region_size(&spapr->hotplug_memory.mr);
> +    hwaddr phb0_base = QEMU_ALIGN_UP(max_hotplug_addr, phb0_alignment);
>      hwaddr phb_base;
>      int i;
>  
>
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f6e9c2a..9f3e004 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2376,12 +2376,15 @@  static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index,
                                 unsigned n_dma, uint32_t *liobns, Error **errp)
 {
     const uint64_t base_buid = 0x800000020000000ULL;
-    const hwaddr phb0_base = 0x10000000000ULL; /* 1 TiB */
     const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */
     const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */
     const hwaddr pio_offset = 0x80000000; /* 2 GiB */
     const uint32_t max_index = 255;
+    const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */
 
+    uint64_t max_hotplug_addr = spapr->hotplug_memory.base +
+        memory_region_size(&spapr->hotplug_memory.mr);
+    hwaddr phb0_base = QEMU_ALIGN_UP(max_hotplug_addr, phb0_alignment);
     hwaddr phb_base;
     int i;