diff mbox series

[v5] limit physical address space size

Message ID 20231107130309.3257776-1-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v5] limit physical address space size | expand

Commit Message

Gerd Hoffmann Nov. 7, 2023, 1:03 p.m. UTC
For better compatibility with old linux kernels,
see source code comment.

Related (same problem in ovmf):
https://github.com/tianocore/edk2/commit/c1e853769046

Cc: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 src/fw/paravirt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Kevin O'Connor Nov. 8, 2023, 6:35 p.m. UTC | #1
On Tue, Nov 07, 2023 at 02:03:09PM +0100, Gerd Hoffmann wrote:
> For better compatibility with old linux kernels,
> see source code comment.
> 
> Related (same problem in ovmf):
> https://github.com/tianocore/edk2/commit/c1e853769046

Thanks.  I'll defer to your judgement on this.  It does seem a little
odd to alter the CPUPhysBits variable itself instead of adding
additional checking to the pciinit.c code that uses CPUPhysBits.
(That is, if there are old Linux kernels that can't handle a very high
PCI space, then a workaround in the PCI code might make it more clear
what is occurring.)

Cheers,
-Kevin


> 
> Cc: Claudio Fontana <cfontana@suse.de>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  src/fw/paravirt.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/src/fw/paravirt.c b/src/fw/paravirt.c
> index e5d4eca0cb5a..a2c9c64d5e78 100644
> --- a/src/fw/paravirt.c
> +++ b/src/fw/paravirt.c
> @@ -182,6 +182,14 @@ static void physbits(int qemu_quirk)
>              __func__, signature, pae ? "yes" : "no", CPULongMode ? "yes" : "no",
>              physbits, valid ? "yes" : "no");
>  
> +    if (valid && physbits > 46) {
> +        // Old linux kernels have trouble dealing with more than 46
> +        // phys-bits, so avoid that for now.  Seems to be a bug in the
> +        // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
> +        dprintf(1, "%s: using phys-bits=46 (old linux kernel compatibility)\n", __func__);
> +        physbits = 46;
> +    }
> +
>      if (valid)
>          CPUPhysBits = physbits;
>  }
> -- 
> 2.41.0
> 
> _______________________________________________
> SeaBIOS mailing list -- seabios@seabios.org
> To unsubscribe send an email to seabios-leave@seabios.org
Claudio Fontana Nov. 10, 2023, 8:36 a.m. UTC | #2
On 11/8/23 19:35, Kevin O'Connor wrote:
> On Tue, Nov 07, 2023 at 02:03:09PM +0100, Gerd Hoffmann wrote:
>> For better compatibility with old linux kernels,
>> see source code comment.
>>
>> Related (same problem in ovmf):
>> https://github.com/tianocore/edk2/commit/c1e853769046
> 
> Thanks.  I'll defer to your judgement on this.  It does seem a little
> odd to alter the CPUPhysBits variable itself instead of adding
> additional checking to the pciinit.c code that uses CPUPhysBits.
> (That is, if there are old Linux kernels that can't handle a very high
> PCI space, then a workaround in the PCI code might make it more clear
> what is occurring.)
> 
> Cheers,
> -Kevin
> 
> 
>>
>> Cc: Claudio Fontana <cfontana@suse.de>
>> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Hi,

I thought about this, and I am not sure it's the right call though.

The change breaking old guests (CentOS7, Ubuntu 18.04 are the known ones, but presumably others as well) in QEMU is:

commit 784155cdcb02ffaae44afecab93861070e7d652d
Author: Gerd Hoffmann <kraxel@redhat.com>
Date:   Mon Sep 11 17:20:26 2023 +0200

    seabios: update submodule to git snapshot
    
    git shortlog
    ------------
    
    Gerd Hoffmann (7):
          disable array bounds warning
          better kvm detection
          detect physical address space size
          move 64bit pci window to end of address space
          be less conservative with the 64bit pci io window
          qemu: log reservations in fw_cfg e820 table
          check for e820 conflict

The reasoning for the change is according to:

https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/message/BHK67ULKJTLJT676J4D5C2ND53CFEL3Q/

"It makes seabios follow a common pattern.  real mode address space
has io resources mapped high (below 1M).  32-bit address space has
io resources mapped high too (below 4G).  This does the same for
64-bit resources."

So it seems to be an almost aesthetic choice, the way I read the "common pattern", one that ends up actually breaking existing workloads though.

Now the correction to that that you propose in SeaBIOS is:

>> ---
>>  src/fw/paravirt.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/src/fw/paravirt.c b/src/fw/paravirt.c
>> index e5d4eca0cb5a..a2c9c64d5e78 100644
>> --- a/src/fw/paravirt.c
>> +++ b/src/fw/paravirt.c
>> @@ -182,6 +182,14 @@ static void physbits(int qemu_quirk)
>>              __func__, signature, pae ? "yes" : "no", CPULongMode ? "yes" : "no",
>>              physbits, valid ? "yes" : "no");
>>  
>> +    if (valid && physbits > 46) {
>> +        // Old linux kernels have trouble dealing with more than 46
>> +        // phys-bits, so avoid that for now.  Seems to be a bug in the
>> +        // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
>> +        dprintf(1, "%s: using phys-bits=46 (old linux kernel compatibility)\n", __func__);
>> +        physbits = 46;
>> +    }
>> +
>>      if (valid)
>>          CPUPhysBits = physbits;
>>  }
>> -- 
>> 2.41.0

but to me this is potentially breaking the situation for another set of users,
those that are passing through 48 physical bits from their host cpu to the guest,
and expect it to actually happen.

Would it not be a better solution to instead either revert the original change,
or to patch it to find a new range that better satisfies code consistency/aesthetic requirements,
but also does not break any running workload?

I could help with the testing for this.

Or we could add some extra workarounds in the stack elsewhere in the management tools to
try to detect older guests (not ideal either), but this seems like adding breakage on top of breakage to me.

Might be wrong though, looking forward for opinions across Seabios and QEMU.

Thanks,

Claudio
Gerd Hoffmann Nov. 10, 2023, 10:51 a.m. UTC | #3
On Fri, Nov 10, 2023 at 09:36:02AM +0100, Claudio Fontana wrote:
> On 11/8/23 19:35, Kevin O'Connor wrote:
> > On Tue, Nov 07, 2023 at 02:03:09PM +0100, Gerd Hoffmann wrote:
> >> For better compatibility with old linux kernels,
> >> see source code comment.
> >>
> >> Related (same problem in ovmf):
> >> https://github.com/tianocore/edk2/commit/c1e853769046
> > 
> > Thanks.  I'll defer to your judgement on this.  It does seem a little
> > odd to alter the CPUPhysBits variable itself instead of adding
> > additional checking to the pciinit.c code that uses CPUPhysBits.
> > (That is, if there are old Linux kernels that can't handle a very high
> > PCI space, then a workaround in the PCI code might make it more clear
> > what is occurring.)
> > 
> > Cheers,
> > -Kevin
> > 
> > 
> >>
> >> Cc: Claudio Fontana <cfontana@suse.de>
> >> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> 
> Hi,
> 
> I thought about this, and I am not sure it's the right call though.
> 
> The change breaking old guests (CentOS7, Ubuntu 18.04 are the known ones, but presumably others as well) in QEMU is:
> 
> commit 784155cdcb02ffaae44afecab93861070e7d652d
> Author: Gerd Hoffmann <kraxel@redhat.com>
> Date:   Mon Sep 11 17:20:26 2023 +0200
> 
>     seabios: update submodule to git snapshot
>     
>     git shortlog
>     ------------
>     
>     Gerd Hoffmann (7):
>           disable array bounds warning
>           better kvm detection
>           detect physical address space size
>           move 64bit pci window to end of address space
>           be less conservative with the 64bit pci io window
>           qemu: log reservations in fw_cfg e820 table
>           check for e820 conflict
> 
> The reasoning for the change is according to:
> 
> https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/message/BHK67ULKJTLJT676J4D5C2ND53CFEL3Q/
> 
> "It makes seabios follow a common pattern.  real mode address space
> has io resources mapped high (below 1M).  32-bit address space has
> io resources mapped high too (below 4G).  This does the same for
> 64-bit resources."
> 
> So it seems to be an almost aesthetic choice, the way I read the
> "common pattern", one that ends up actually breaking existing
> workloads though.

It's not about aesthetics.  It's about handing out more resources to the
64bit mmio window and also to the pci bridge windows if we have enough
address space for that.  This is important because not only main memory
sizes grow, but also device memory grows.  Especially GPUs and AI
accelerators can have rather big PCI memory bars these days.  If you
want support hotplugging them you need pcie root ports with big enough
bridge windows.

The "common pattern" refers to OVMF doing the same for the same reason.

> Now the correction to that that you propose in SeaBIOS is:
> 
> >> +    if (valid && physbits > 46) {
> >> +        // Old linux kernels have trouble dealing with more than 46
> >> +        // phys-bits, so avoid that for now.  Seems to be a bug in the
> >> +        // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
> >> +        dprintf(1, "%s: using phys-bits=46 (old linux kernel compatibility)\n", __func__);
> >> +        physbits = 46;
> >> +    }

> but to me this is potentially breaking the situation for another set
> of users, those that are passing through 48 physical bits from their
> host cpu to the guest, and expect it to actually happen.

This doesn't change the address space size the guest does see.  seabios
does not have the power to change the information returned by the cpuid
instruction.

This only changes the placement of the PCI bars.  The pci setup code is
the only consumer of that variable, guess it makes sense to move the
quirk to the pci code (as suggested by Kevin) to clarify this.

> Would it not be a better solution to instead either revert the
> original change,

No (see above).

> or to patch it to find a new range that better satisfies code
> consistency/aesthetic requirements, but also does not break any
> running workload?

Upstream OVMF does the same thing for roughly one year, the old linux
kernel issue is the only one which showed up so far.  OVMF wouldn't see
*really* old guests (without UEFI support) though.

The pci setup code already has a bunch of conditions for activating the
64bit mmio window, to avoid breaking really old guests.  It wouldn't
happen on CPUs without long mode.  It also wouldn't happen if guests
don't have memory above 4G.

I'm open to suggestions to refine this.

> Or we could add some extra workarounds in the stack elsewhere in the
> management tools to try to detect older guests (not ideal either), but
> this seems like adding breakage on top of breakage to me.

We already have libosinfo which records guest capabilities, which exists
to solve exactly that problem:  Allow new guests use new features by
default, without breaking old guests.  Extending libosinfo looks like a
perfectly valid approach to me, especially considering that we probably
want make the 46 phys-bits limit conditional some day in the future, to
allow even larger guests.

take care,
  Gerd
Gerd Hoffmann Nov. 10, 2023, 11:04 a.m. UTC | #4
Hi,

> This only changes the placement of the PCI bars.  The pci setup code is
> the only consumer of that variable, guess it makes sense to move the
> quirk to the pci code (as suggested by Kevin) to clarify this.

i.e. like this:

From d538dc7d4316e557ae302464252444d09de0681d Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Tue, 7 Nov 2023 13:49:31 +0100
Subject: [PATCH] limit physical address space size

For better compatibility with old linux kernels,
see source code comment.

Related (same problem in ovmf):
https://github.com/tianocore/edk2/commit/c1e853769046

Reported-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 src/fw/pciinit.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c7084f5e397e..7aeea61bfd05 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -52,6 +52,7 @@ u64 pcimem64_start = BUILD_PCIMEM64_START;
 u64 pcimem64_end   = BUILD_PCIMEM64_END;
 u64 pci_io_low_end = 0xa000;
 u32 pci_use_64bit  = 0;
+u32 pci_phys_bits  = 0;
 
 struct pci_region_entry {
     struct pci_device *dev;
@@ -967,7 +968,7 @@ static int pci_bios_check_devices(struct pci_bus *busses)
             if (hotplug_support == HOTPLUG_PCIE)
                 resource_optional = pcie_cap && (type == PCI_REGION_TYPE_IO);
             if (hotplug_support && pci_use_64bit && is64 && (type == PCI_REGION_TYPE_PREFMEM))
-                align = (u64)1 << (CPUPhysBits - 11);
+                align = (u64)1 << (pci_phys_bits - 11);
             if (align > sum && hotplug_support && !resource_optional)
                 sum = align; /* reserve min size for hot-plug */
             if (size > sum) {
@@ -1131,12 +1132,12 @@ static void pci_bios_map_devices(struct pci_bus *busses)
         r64_mem.base = le64_to_cpu(romfile_loadint("etc/reserved-memory-end", 0));
         if (r64_mem.base < 0x100000000LL + RamSizeOver4G)
             r64_mem.base = 0x100000000LL + RamSizeOver4G;
-        if (CPUPhysBits) {
-            u64 top = 1LL << CPUPhysBits;
+        if (pci_phys_bits) {
+            u64 top = 1LL << pci_phys_bits;
             u64 size = (ALIGN(sum_mem, (1LL<<30)) +
                         ALIGN(sum_pref, (1LL<<30)));
             if (pci_use_64bit)
-                size = ALIGN(size, (1LL<<(CPUPhysBits-3)));
+                size = ALIGN(size, (1LL<<(pci_phys_bits-3)));
             if (r64_mem.base < top - size) {
                 r64_mem.base = top - size;
             }
@@ -1181,8 +1182,16 @@ pci_setup(void)
 
     dprintf(3, "pci setup\n");
 
-    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G)
+    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) {
         pci_use_64bit = 1;
+        pci_phys_bits = CPUPhysBits;
+        if (pci_phys_bits > 46) {
+            // Old linux kernels have trouble dealing with more than 46
+            // phys-bits, so avoid that for now.  Seems to be a bug in the
+            // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
+            pci_phys_bits = 46;
+        }
+    }
 
     dprintf(1, "=== PCI bus & bridge init ===\n");
     if (pci_probe_host() != 0) {
Kevin O'Connor Nov. 10, 2023, 4:44 p.m. UTC | #5
On Fri, Nov 10, 2023 at 12:04:24PM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > This only changes the placement of the PCI bars.  The pci setup code is
> > the only consumer of that variable, guess it makes sense to move the
> > quirk to the pci code (as suggested by Kevin) to clarify this.
> 
> i.e. like this:
> 
> From d538dc7d4316e557ae302464252444d09de0681d Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Tue, 7 Nov 2023 13:49:31 +0100
> Subject: [PATCH] limit physical address space size
> 
> For better compatibility with old linux kernels,
> see source code comment.
> 
> Related (same problem in ovmf):
> https://github.com/tianocore/edk2/commit/c1e853769046
> 
> Reported-by: Claudio Fontana <cfontana@suse.de>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  src/fw/pciinit.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
> index c7084f5e397e..7aeea61bfd05 100644
> --- a/src/fw/pciinit.c
> +++ b/src/fw/pciinit.c
> @@ -52,6 +52,7 @@ u64 pcimem64_start = BUILD_PCIMEM64_START;
>  u64 pcimem64_end   = BUILD_PCIMEM64_END;
>  u64 pci_io_low_end = 0xa000;
>  u32 pci_use_64bit  = 0;
> +u32 pci_phys_bits  = 0;

FWIW, all these flags are getting a bit confusing.  Maybe do something
like the patch below.

>  
>  struct pci_region_entry {
>      struct pci_device *dev;
> @@ -967,7 +968,7 @@ static int pci_bios_check_devices(struct pci_bus *busses)
>              if (hotplug_support == HOTPLUG_PCIE)
>                  resource_optional = pcie_cap && (type == PCI_REGION_TYPE_IO);
>              if (hotplug_support && pci_use_64bit && is64 && (type == PCI_REGION_TYPE_PREFMEM))
> -                align = (u64)1 << (CPUPhysBits - 11);
> +                align = (u64)1 << (pci_phys_bits - 11);
>              if (align > sum && hotplug_support && !resource_optional)
>                  sum = align; /* reserve min size for hot-plug */
>              if (size > sum) {
> @@ -1131,12 +1132,12 @@ static void pci_bios_map_devices(struct pci_bus *busses)
>          r64_mem.base = le64_to_cpu(romfile_loadint("etc/reserved-memory-end", 0));
>          if (r64_mem.base < 0x100000000LL + RamSizeOver4G)
>              r64_mem.base = 0x100000000LL + RamSizeOver4G;
> -        if (CPUPhysBits) {
> -            u64 top = 1LL << CPUPhysBits;
> +        if (pci_phys_bits) {

FYI, this is a change in behavior - previously this condition would
have been taken even if CPULongMode or RamSizeOver4G is false.  I'm
not sure if this change is intentional.  (My example patch below
follows your lead here.)

> +            u64 top = 1LL << pci_phys_bits;
>              u64 size = (ALIGN(sum_mem, (1LL<<30)) +
>                          ALIGN(sum_pref, (1LL<<30)));
>              if (pci_use_64bit)
> -                size = ALIGN(size, (1LL<<(CPUPhysBits-3)));
> +                size = ALIGN(size, (1LL<<(pci_phys_bits-3)));
>              if (r64_mem.base < top - size) {
>                  r64_mem.base = top - size;
>              }
> @@ -1181,8 +1182,16 @@ pci_setup(void)
>  
>      dprintf(3, "pci setup\n");
>  
> -    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G)
> +    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) {
>          pci_use_64bit = 1;
> +        pci_phys_bits = CPUPhysBits;
> +        if (pci_phys_bits > 46) {
> +            // Old linux kernels have trouble dealing with more than 46
> +            // phys-bits, so avoid that for now.  Seems to be a bug in the
> +            // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
> +            pci_phys_bits = 46;
> +        }
> +    }
>  
>      dprintf(1, "=== PCI bus & bridge init ===\n");
>      if (pci_probe_host() != 0) {
> -- 
> 2.41.0
>


Possible alternate variable naming:

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c7084f5..cd64d6e 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -46,12 +46,16 @@ static const char *region_type_name[] = {
     [ PCI_REGION_TYPE_PREFMEM ] = "prefmem",
 };
 
+// Memory ranges exported to legacy ACPI type table generation
 u64 pcimem_start   = BUILD_PCIMEM_START;
 u64 pcimem_end     = BUILD_PCIMEM_END;
 u64 pcimem64_start = BUILD_PCIMEM64_START;
 u64 pcimem64_end   = BUILD_PCIMEM64_END;
-u64 pci_io_low_end = 0xa000;
-u32 pci_use_64bit  = 0;
+
+// Memory resource allocation tracking
+static u64 pci_io_low_end = 0xa000;
+static u32 pci_use_64bit  = 0;
+static u64 pci_mem64_top  = 0;
 
 struct pci_region_entry {
     struct pci_device *dev;
@@ -966,8 +970,9 @@ static int pci_bios_check_devices(struct pci_bus *busses)
             int resource_optional = 0;
             if (hotplug_support == HOTPLUG_PCIE)
                 resource_optional = pcie_cap && (type == PCI_REGION_TYPE_IO);
-            if (hotplug_support && pci_use_64bit && is64 && (type == PCI_REGION_TYPE_PREFMEM))
-                align = (u64)1 << (CPUPhysBits - 11);
+            if (hotplug_support && pci_use_64bit && is64
+                && (type == PCI_REGION_TYPE_PREFMEM))
+                align = pci_mem64_top >> 11;
             if (align > sum && hotplug_support && !resource_optional)
                 sum = align; /* reserve min size for hot-plug */
             if (size > sum) {
@@ -1131,14 +1136,13 @@ static void pci_bios_map_devices(struct pci_bus *busses)
         r64_mem.base = le64_to_cpu(romfile_loadint("etc/reserved-memory-end", 0));
         if (r64_mem.base < 0x100000000LL + RamSizeOver4G)
             r64_mem.base = 0x100000000LL + RamSizeOver4G;
-        if (CPUPhysBits) {
-            u64 top = 1LL << CPUPhysBits;
+        if (pci_mem64_top) {
             u64 size = (ALIGN(sum_mem, (1LL<<30)) +
                         ALIGN(sum_pref, (1LL<<30)));
             if (pci_use_64bit)
-                size = ALIGN(size, (1LL<<(CPUPhysBits-3)));
-            if (r64_mem.base < top - size) {
-                r64_mem.base = top - size;
+                size = ALIGN(size, pci_mem64_top >> 3);
+            if (r64_mem.base < pci_mem64_top - size) {
+                r64_mem.base = pci_mem64_top - size;
             }
             if (e820_is_used(r64_mem.base, size))
                 r64_mem.base -= size;
@@ -1181,8 +1185,16 @@ pci_setup(void)
 
     dprintf(3, "pci setup\n");
 
-    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G)
+    if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G) {
         pci_use_64bit = 1;
+        pci_mem64_top = 1LL << CPUPhysBits;
+        if (CPUPhysBits > 46) {
+            // Old linux kernels have trouble dealing with more than 46
+            // phys-bits, so avoid that for now.  Seems to be a bug in the
+            // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
+            pci_mem64_top = 1LL << 46;
+        }
+    }
 
     dprintf(1, "=== PCI bus & bridge init ===\n");
     if (pci_probe_host() != 0) {


Cheers,
-Kevin
Kevin O'Connor Nov. 10, 2023, 5:05 p.m. UTC | #6
On Fri, Nov 10, 2023 at 11:44:48AM -0500, Kevin O'Connor wrote:
> On Fri, Nov 10, 2023 at 12:04:24PM +0100, Gerd Hoffmann wrote:
> > -        if (CPUPhysBits) {
> > -            u64 top = 1LL << CPUPhysBits;
> > +        if (pci_phys_bits) {
> 
> FYI, this is a change in behavior - previously this condition would
> have been taken even if CPULongMode or RamSizeOver4G is false.  I'm
> not sure if this change is intentional.  (My example patch below
> follows your lead here.)
> 

On closer inspection, I think this change in behavior was not
intended.  How about variable names like the below instead.

-Kevin


diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c7084f5..6b13cd5 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -46,12 +46,16 @@ static const char *region_type_name[] = {
     [ PCI_REGION_TYPE_PREFMEM ] = "prefmem",
 };
 
+// Memory ranges exported to legacy ACPI type table generation
 u64 pcimem_start   = BUILD_PCIMEM_START;
 u64 pcimem_end     = BUILD_PCIMEM_END;
 u64 pcimem64_start = BUILD_PCIMEM64_START;
 u64 pcimem64_end   = BUILD_PCIMEM64_END;
-u64 pci_io_low_end = 0xa000;
-u32 pci_use_64bit  = 0;
+
+// Resource allocation limits
+static u64 pci_io_low_end = 0xa000;
+static u64 pci_mem64_top  = 0;
+static u32 pci_pad_mem64  = 0;
 
 struct pci_region_entry {
     struct pci_device *dev;
@@ -966,8 +970,9 @@ static int pci_bios_check_devices(struct pci_bus *busses)
             int resource_optional = 0;
             if (hotplug_support == HOTPLUG_PCIE)
                 resource_optional = pcie_cap && (type == PCI_REGION_TYPE_IO);
-            if (hotplug_support && pci_use_64bit && is64 && (type == PCI_REGION_TYPE_PREFMEM))
-                align = (u64)1 << (CPUPhysBits - 11);
+            if (hotplug_support && pci_pad_mem64 && is64
+                && (type == PCI_REGION_TYPE_PREFMEM))
+                align = pci_mem64_top >> 11;
             if (align > sum && hotplug_support && !resource_optional)
                 sum = align; /* reserve min size for hot-plug */
             if (size > sum) {
@@ -1111,7 +1116,7 @@ static void pci_bios_map_devices(struct pci_bus *busses)
         panic("PCI: out of I/O address space\n");
 
     dprintf(1, "PCI: 32: %016llx - %016llx\n", pcimem_start, pcimem_end);
-    if (pci_use_64bit || pci_bios_init_root_regions_mem(busses)) {
+    if (pci_pad_mem64 || pci_bios_init_root_regions_mem(busses)) {
         struct pci_region r64_mem, r64_pref;
         r64_mem.list.first = NULL;
         r64_pref.list.first = NULL;
@@ -1131,14 +1136,13 @@ static void pci_bios_map_devices(struct pci_bus *busses)
         r64_mem.base = le64_to_cpu(romfile_loadint("etc/reserved-memory-end", 0));
         if (r64_mem.base < 0x100000000LL + RamSizeOver4G)
             r64_mem.base = 0x100000000LL + RamSizeOver4G;
-        if (CPUPhysBits) {
-            u64 top = 1LL << CPUPhysBits;
+        if (pci_mem64_top) {
             u64 size = (ALIGN(sum_mem, (1LL<<30)) +
                         ALIGN(sum_pref, (1LL<<30)));
-            if (pci_use_64bit)
-                size = ALIGN(size, (1LL<<(CPUPhysBits-3)));
-            if (r64_mem.base < top - size) {
-                r64_mem.base = top - size;
+            if (pci_pad_mem64)
+                size = ALIGN(size, pci_mem64_top >> 3);
+            if (r64_mem.base < pci_mem64_top - size) {
+                r64_mem.base = pci_mem64_top - size;
             }
             if (e820_is_used(r64_mem.base, size))
                 r64_mem.base -= size;
@@ -1181,8 +1185,18 @@ pci_setup(void)
 
     dprintf(3, "pci setup\n");
 
+    if (CPUPhysBits) {
+        pci_mem64_top = 1LL << CPUPhysBits;
+        if (CPUPhysBits > 46) {
+            // Old linux kernels have trouble dealing with more than 46
+            // phys-bits, so avoid that for now.  Seems to be a bug in the
+            // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
+            pci_mem64_top = 1LL << 46;
+        }
+    }
+
     if (CPUPhysBits >= 36 && CPULongMode && RamSizeOver4G)
-        pci_use_64bit = 1;
+        pci_pad_mem64 = 1;
 
     dprintf(1, "=== PCI bus & bridge init ===\n");
     if (pci_probe_host() != 0) {
diff mbox series

Patch

diff --git a/src/fw/paravirt.c b/src/fw/paravirt.c
index e5d4eca0cb5a..a2c9c64d5e78 100644
--- a/src/fw/paravirt.c
+++ b/src/fw/paravirt.c
@@ -182,6 +182,14 @@  static void physbits(int qemu_quirk)
             __func__, signature, pae ? "yes" : "no", CPULongMode ? "yes" : "no",
             physbits, valid ? "yes" : "no");
 
+    if (valid && physbits > 46) {
+        // Old linux kernels have trouble dealing with more than 46
+        // phys-bits, so avoid that for now.  Seems to be a bug in the
+        // virtio-pci driver.  Reported: centos-7, ubuntu-18.04
+        dprintf(1, "%s: using phys-bits=46 (old linux kernel compatibility)\n", __func__);
+        physbits = 46;
+    }
+
     if (valid)
         CPUPhysBits = physbits;
 }