diff mbox series

[2/2] hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base

Message ID 20180924221346.16733-3-lersek@redhat.com (mailing list archive)
State New, archived
Headers show
Series hw/pci-host/x86: extend the 64-bit PCI hole relative to the fw-assigned base | expand

Commit Message

Laszlo Ersek Sept. 24, 2018, 10:13 p.m. UTC
In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
64-bit MMIO aperture to the guest OS for hotplug purposes.

In that commit, we added or modified five functions:

- pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
  64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
  the DIMM hotplug area too (if any).

- i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
  board-specific 64-bit base property getters called abstractly by the
  ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
  firmware didn't program any 64-bit hole (i.e. if the firmware didn't
  assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
  honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
  GPA programmed by the firmware as the base address for the aperture).

- i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
  these intended to extend the aperture to our size recommendation,
  calculated relative to the base of the aperture.

Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
q35_host_get_pci_hole64_end() currently only extend the aperture relative
to the default base (pc_pci_hole64_start()), ignoring any programming done
by the firmware. This means that our size recommendation may not be met.
Fix it by honoring the firmware's address assignments.

The strange extension sizes were spotted by Alex, in the log of a guest
kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).

This change only affects DSDT generation, therefore no new compat property
is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
64-bit BARs, the patch makes no difference to SeaBIOS guests. (Which is in
turn why ACPI test data for the "bios-tables-test" need not be refreshed.)

Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:

> @@ -881,9 +881,9 @@
>              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>                  0x0000000000000000, // Granularity
>                  0x0000000800000000, // Range Minimum
> -                0x000000080001C0FF, // Range Maximum
> +                0x000000087FFFFFFF, // Range Maximum
>                  0x0000000000000000, // Translation Offset
> -                0x000000000001C100, // Length
> +                0x0000000080000000, // Length
>                  ,, , AddressRangeMemory, TypeStatic)
>          })
>          Device (GPE0)

(On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
(5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
the firmware-programmed base of 32GB, before the patch. Therefore, before
the patch, the extension is ineffective. After the patch, we add the 2GB
extension to the firmware-programmed base, namely 32GB.)

Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:

> @@ -3162,9 +3162,9 @@
>              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>                  0x0000000000000000, // Granularity
>                  0x0000000800000000, // Range Minimum
> -                0x00000009BFFFFFFF, // Range Maximum
> +                0x0000000FFFFFFFFF, // Range Maximum
>                  0x0000000000000000, // Translation Offset
> -                0x00000001C0000000, // Length
> +                0x0000000800000000, // Length
>                  ,, , AddressRangeMemory, TypeStatic)
>          })
>          Device (GPE0)

(On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
the 32GB extension to that yields 39GB (0x0000_0009_BFFF_FFFF + 1), before
the patch. After the patch, we add the 32GB extension to the
firmware-programmed base, namely 32GB.)

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Link: http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 hw/pci-host/piix.c | 2 +-
 hw/pci-host/q35.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Marcel Apfelbaum Sept. 25, 2018, 6:07 p.m. UTC | #1
Hi Laszlo,

On 9/25/18 1:13 AM, Laszlo Ersek wrote:
> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>

[...]

> Also, because SeaBIOS never assigns 64-bit GPAs to
> 64-bit BARs, the patch makes no difference to SeaBIOS guests.

And this is why we didn't catch that earlier... we need to include
OVMF in our basic tests.

>   (Which is in
> turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
>
> Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
>
>> @@ -881,9 +881,9 @@
>>               QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>>                   0x0000000000000000, // Granularity
>>                   0x0000000800000000, // Range Minimum
>> -                0x000000080001C0FF, // Range Maximum
>> +                0x000000087FFFFFFF, // Range Maximum
>>                   0x0000000000000000, // Translation Offset
>> -                0x000000000001C100, // Length
>> +                0x0000000080000000, // Length
>>                   ,, , AddressRangeMemory, TypeStatic)
>>           })
>>           Device (GPE0)
> (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
> guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
> (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
> the firmware-programmed base of 32GB, before the patch. Therefore, before
> the patch, the extension is ineffective. After the patch, we add the 2GB
> extension to the firmware-programmed base, namely 32GB.)
>
> Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
>
>> @@ -3162,9 +3162,9 @@
>>               QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>>                   0x0000000000000000, // Granularity
>>                   0x0000000800000000, // Range Minimum
>> -                0x00000009BFFFFFFF, // Range Maximum
>> +                0x0000000FFFFFFFFF, // Range Maximum
>>                   0x0000000000000000, // Translation Offset
>> -                0x00000001C0000000, // Length
>> +                0x0000000800000000, // Length
>>                   ,, , AddressRangeMemory, TypeStatic)
>>           })
>>           Device (GPE0)
> (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
> DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
> the 32GB extension to that yields 39GB (0x0000_0009_BFFF_FFFF + 1), before
> the patch. After the patch, we add the 32GB extension to the
> firmware-programmed base, namely 32GB.)
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Link: http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
> Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>   hw/pci-host/piix.c | 2 +-
>   hw/pci-host/q35.c  | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index 0df91e002076..fb4b0669ac9f 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,
>   {
>       PCIHostState *h = PCI_HOST_BRIDGE(obj);
>       I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
> -    uint64_t hole64_start = pc_pci_hole64_start();
> +    uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
>       Range w64;
>       uint64_t value, hole64_end;
>   
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 8acf942b5e65..1c5433161f14 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
>   {
>       PCIHostState *h = PCI_HOST_BRIDGE(obj);
>       Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> -    uint64_t hole64_start = pc_pci_hole64_start();
> +    uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
>       Range w64;
>       uint64_t value, hole64_end;
>   


Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com>


Thanks,
Marcel
Alex Williamson Sept. 25, 2018, 8:36 p.m. UTC | #2
On Tue, 25 Sep 2018 00:13:46 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> 
> In that commit, we added or modified five functions:
> 
> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>   the DIMM hotplug area too (if any).
> 
> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>   board-specific 64-bit base property getters called abstractly by the
>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>   GPA programmed by the firmware as the base address for the aperture).
> 
> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>   these intended to extend the aperture to our size recommendation,
>   calculated relative to the base of the aperture.
> 
> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> to the default base (pc_pci_hole64_start()), ignoring any programming done
> by the firmware. This means that our size recommendation may not be met.
> Fix it by honoring the firmware's address assignments.
> 
> The strange extension sizes were spotted by Alex, in the log of a guest
> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> 
> This change only affects DSDT generation, therefore no new compat property
> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> 64-bit BARs, the patch makes no difference to SeaBIOS guests.

This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
only if it cannot satisfy all the BARs from 32-bit MMMIO, see
src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
assigned GPUs and you'll eventually cross that threshold and all 64-bit
BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
devices could do the same.  Thanks,

Alex

> (Which is in
> turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
> 
> Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
> 
> > @@ -881,9 +881,9 @@
> >              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
> >                  0x0000000000000000, // Granularity
> >                  0x0000000800000000, // Range Minimum
> > -                0x000000080001C0FF, // Range Maximum
> > +                0x000000087FFFFFFF, // Range Maximum
> >                  0x0000000000000000, // Translation Offset
> > -                0x000000000001C100, // Length
> > +                0x0000000080000000, // Length
> >                  ,, , AddressRangeMemory, TypeStatic)
> >          })
> >          Device (GPE0)  
> 
> (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
> guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
> (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
> the firmware-programmed base of 32GB, before the patch. Therefore, before
> the patch, the extension is ineffective. After the patch, we add the 2GB
> extension to the firmware-programmed base, namely 32GB.)
> 
> Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
> 
> > @@ -3162,9 +3162,9 @@
> >              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
> >                  0x0000000000000000, // Granularity
> >                  0x0000000800000000, // Range Minimum
> > -                0x00000009BFFFFFFF, // Range Maximum
> > +                0x0000000FFFFFFFFF, // Range Maximum
> >                  0x0000000000000000, // Translation Offset
> > -                0x00000001C0000000, // Length
> > +                0x0000000800000000, // Length
> >                  ,, , AddressRangeMemory, TypeStatic)
> >          })
> >          Device (GPE0)  
> 
> (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
> DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
> the 32GB extension to that yields 39GB (0x0000_0009_BFFF_FFFF + 1), before
> the patch. After the patch, we add the 32GB extension to the
> firmware-programmed base, namely 32GB.)
> 
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Link: http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
> Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  hw/pci-host/piix.c | 2 +-
>  hw/pci-host/q35.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index 0df91e002076..fb4b0669ac9f 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,
>  {
>      PCIHostState *h = PCI_HOST_BRIDGE(obj);
>      I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
> -    uint64_t hole64_start = pc_pci_hole64_start();
> +    uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
>      Range w64;
>      uint64_t value, hole64_end;
>  
> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> index 8acf942b5e65..1c5433161f14 100644
> --- a/hw/pci-host/q35.c
> +++ b/hw/pci-host/q35.c
> @@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
>  {
>      PCIHostState *h = PCI_HOST_BRIDGE(obj);
>      Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> -    uint64_t hole64_start = pc_pci_hole64_start();
> +    uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
>      Range w64;
>      uint64_t value, hole64_end;
>
Michael S. Tsirkin Sept. 25, 2018, 9:03 p.m. UTC | #3
On Tue, Sep 25, 2018 at 09:07:45PM +0300, Marcel Apfelbaum wrote:
> Hi Laszlo,
> 
> On 9/25/18 1:13 AM, Laszlo Ersek wrote:
> > In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> > hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> > the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> > property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> > 64-bit MMIO aperture to the guest OS for hotplug purposes.
> > 
> 
> [...]
> 
> > Also, because SeaBIOS never assigns 64-bit GPAs to
> > 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> And this is why we didn't catch that earlier... we need to include
> OVMF in our basic tests.

For that we need to distribute the binary with QEMU, we don't
want basic tests to have another external dependency.

> >   (Which is in
> > turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
> > 
> > Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
> > 
> > > @@ -881,9 +881,9 @@
> > >               QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
> > >                   0x0000000000000000, // Granularity
> > >                   0x0000000800000000, // Range Minimum
> > > -                0x000000080001C0FF, // Range Maximum
> > > +                0x000000087FFFFFFF, // Range Maximum
> > >                   0x0000000000000000, // Translation Offset
> > > -                0x000000000001C100, // Length
> > > +                0x0000000080000000, // Length
> > >                   ,, , AddressRangeMemory, TypeStatic)
> > >           })
> > >           Device (GPE0)
> > (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
> > guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
> > (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
> > the firmware-programmed base of 32GB, before the patch. Therefore, before
> > the patch, the extension is ineffective. After the patch, we add the 2GB
> > extension to the firmware-programmed base, namely 32GB.)
> > 
> > Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
> > 
> > > @@ -3162,9 +3162,9 @@
> > >               QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
> > >                   0x0000000000000000, // Granularity
> > >                   0x0000000800000000, // Range Minimum
> > > -                0x00000009BFFFFFFF, // Range Maximum
> > > +                0x0000000FFFFFFFFF, // Range Maximum
> > >                   0x0000000000000000, // Translation Offset
> > > -                0x00000001C0000000, // Length
> > > +                0x0000000800000000, // Length
> > >                   ,, , AddressRangeMemory, TypeStatic)
> > >           })
> > >           Device (GPE0)
> > (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
> > DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
> > the 32GB extension to that yields 39GB (0x0000_0009_BFFF_FFFF + 1), before
> > the patch. After the patch, we add the 32GB extension to the
> > firmware-programmed base, namely 32GB.)
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Link: http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
> > Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
> > Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > ---
> >   hw/pci-host/piix.c | 2 +-
> >   hw/pci-host/q35.c  | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> > index 0df91e002076..fb4b0669ac9f 100644
> > --- a/hw/pci-host/piix.c
> > +++ b/hw/pci-host/piix.c
> > @@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,
> >   {
> >       PCIHostState *h = PCI_HOST_BRIDGE(obj);
> >       I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
> > -    uint64_t hole64_start = pc_pci_hole64_start();
> > +    uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
> >       Range w64;
> >       uint64_t value, hole64_end;
> > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> > index 8acf942b5e65..1c5433161f14 100644
> > --- a/hw/pci-host/q35.c
> > +++ b/hw/pci-host/q35.c
> > @@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
> >   {
> >       PCIHostState *h = PCI_HOST_BRIDGE(obj);
> >       Q35PCIHost *s = Q35_HOST_DEVICE(obj);
> > -    uint64_t hole64_start = pc_pci_hole64_start();
> > +    uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
> >       Range w64;
> >       uint64_t value, hole64_end;
> 
> 
> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com>
> 
> 
> Thanks,
> Marcel
>
Laszlo Ersek Sept. 25, 2018, 9:23 p.m. UTC | #4
On 09/25/18 22:36, Alex Williamson wrote:
> On Tue, 25 Sep 2018 00:13:46 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>
>> In that commit, we added or modified five functions:
>>
>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>   the DIMM hotplug area too (if any).
>>
>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>   board-specific 64-bit base property getters called abstractly by the
>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>   GPA programmed by the firmware as the base address for the aperture).
>>
>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>   these intended to extend the aperture to our size recommendation,
>>   calculated relative to the base of the aperture.
>>
>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>> by the firmware. This means that our size recommendation may not be met.
>> Fix it by honoring the firmware's address assignments.
>>
>> The strange extension sizes were spotted by Alex, in the log of a guest
>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>
>> This change only affects DSDT generation, therefore no new compat property
>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> src/fw/pciinit.c:pci_bios_map_devices.

Indeed, this mistake in the commit message briefly crossed my mind, some
time after posting the series. Then I promptly forgot about it again. :/

(Because, if SeaBIOS really never picked 64-bit addresses, then commit
9fa99d2519cb would have been mostly untestable, in the first place.)

Thank you for pointing this out!

> Create a VM with several
> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> devices could do the same.  Thanks,

OK, I think I'll have to do those ivshmem tests, and rework the commit
message a bit.

Thanks!
Laszlo

> 
> Alex
> 
>> (Which is in
>> turn why ACPI test data for the "bios-tables-test" need not be refreshed.)
>>
>> Using an i440fx OVMF guest with 5GB RAM, an example _CRS change is:
>>
>>> @@ -881,9 +881,9 @@
>>>              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>>>                  0x0000000000000000, // Granularity
>>>                  0x0000000800000000, // Range Minimum
>>> -                0x000000080001C0FF, // Range Maximum
>>> +                0x000000087FFFFFFF, // Range Maximum
>>>                  0x0000000000000000, // Translation Offset
>>> -                0x000000000001C100, // Length
>>> +                0x0000000080000000, // Length
>>>                  ,, , AddressRangeMemory, TypeStatic)
>>>          })
>>>          Device (GPE0)  
>>
>> (On i440fx, the low RAM split is at 3GB, in this case. Therefore, with 5GB
>> guest RAM and no DIMM hotplug range, pc_pci_hole64_start() returns 4 +
>> (5-3) = 6 GB. Adding the 2GB extension to that yields 8GB, which is below
>> the firmware-programmed base of 32GB, before the patch. Therefore, before
>> the patch, the extension is ineffective. After the patch, we add the 2GB
>> extension to the firmware-programmed base, namely 32GB.)
>>
>> Using a q35 OVMF guest with 5GB RAM, an example _CRS change is:
>>
>>> @@ -3162,9 +3162,9 @@
>>>              QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, Cacheable, ReadWrite,
>>>                  0x0000000000000000, // Granularity
>>>                  0x0000000800000000, // Range Minimum
>>> -                0x00000009BFFFFFFF, // Range Maximum
>>> +                0x0000000FFFFFFFFF, // Range Maximum
>>>                  0x0000000000000000, // Translation Offset
>>> -                0x00000001C0000000, // Length
>>> +                0x0000000800000000, // Length
>>>                  ,, , AddressRangeMemory, TypeStatic)
>>>          })
>>>          Device (GPE0)  
>>
>> (On Q35, the low RAM split is at 2GB. Therefore, with 5GB guest RAM and no
>> DIMM hotplug range, pc_pci_hole64_start() returns 4 + (5-2) = 7 GB. Adding
>> the 32GB extension to that yields 39GB (0x0000_0009_BFFF_FFFF + 1), before
>> the patch. After the patch, we add the 32GB extension to the
>> firmware-programmed base, namely 32GB.)
>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>> Link: http://mid.mail-archive.com/a56b3710-9c2d-9ad0-5590-efe30b6d7bd9@redhat.com
>> Fixes: 9fa99d2519cbf71f871e46871df12cb446dc1c3e
>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
>> ---
>>  hw/pci-host/piix.c | 2 +-
>>  hw/pci-host/q35.c  | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
>> index 0df91e002076..fb4b0669ac9f 100644
>> --- a/hw/pci-host/piix.c
>> +++ b/hw/pci-host/piix.c
>> @@ -285,7 +285,7 @@ static void i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,
>>  {
>>      PCIHostState *h = PCI_HOST_BRIDGE(obj);
>>      I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
>> -    uint64_t hole64_start = pc_pci_hole64_start();
>> +    uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
>>      Range w64;
>>      uint64_t value, hole64_end;
>>  
>> diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
>> index 8acf942b5e65..1c5433161f14 100644
>> --- a/hw/pci-host/q35.c
>> +++ b/hw/pci-host/q35.c
>> @@ -145,7 +145,7 @@ static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
>>  {
>>      PCIHostState *h = PCI_HOST_BRIDGE(obj);
>>      Q35PCIHost *s = Q35_HOST_DEVICE(obj);
>> -    uint64_t hole64_start = pc_pci_hole64_start();
>> +    uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
>>      Range w64;
>>      uint64_t value, hole64_end;
>>  
>
Laszlo Ersek Sept. 26, 2018, 11:12 a.m. UTC | #5
On 09/25/18 22:36, Alex Williamson wrote:
> On Tue, 25 Sep 2018 00:13:46 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>
>> In that commit, we added or modified five functions:
>>
>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>   the DIMM hotplug area too (if any).
>>
>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>   board-specific 64-bit base property getters called abstractly by the
>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>   GPA programmed by the firmware as the base address for the aperture).
>>
>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>   these intended to extend the aperture to our size recommendation,
>>   calculated relative to the base of the aperture.
>>
>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>> by the firmware. This means that our size recommendation may not be met.
>> Fix it by honoring the firmware's address assignments.
>>
>> The strange extension sizes were spotted by Alex, in the log of a guest
>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>
>> This change only affects DSDT generation, therefore no new compat property
>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.
> 
> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> devices could do the same.  Thanks,

The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.

However, using SeaBIOS+i440fx, I can't show the difference. I've been
experimenting with various ivshmem devices (even multiple at the same
time, with different sizes). The "all or nothing" nature of SeaBIOS's
high allocation of the 64-bit BARs, combined with hugepage alignment
inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
for i440fx, seem to make it surprisingly difficult to trigger the issue.

I figure I should:

(1) remove the sentence "the patch makes no difference to SeaBIOS
guests" from the commit message,

(2) include the DSDT diff on SeaBIOS/q35 in the commit message,

(3) remain silent on SeaBIOS/i440fx, in the commit message,

(4) append a new patch, for "bios-tables-test", so that the ACPI gen
change is validated as part of the test suite, on SeaBIOS/q35.

Regarding (4):

- is it OK if I add the test only for Q35?

- what guest RAM size am I allowed to use in the test suite? In my own
SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
acceptable for the test suite.

Thanks!
Laszlo
Laszlo Ersek Sept. 26, 2018, 11:35 a.m. UTC | #6
On 09/26/18 13:12, Laszlo Ersek wrote:

> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> change is validated as part of the test suite, on SeaBIOS/q35.
> 
> Regarding (4):
> 
> - is it OK if I add the test only for Q35?
> 
> - what guest RAM size am I allowed to use in the test suite? In my own
> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> acceptable for the test suite.

And, even if the patch's effect can be shown with little guest DRAM, the
test case still requires a multi-gig ivshmem-plain device. In
"tests/ivshmem-test.c", I see how it is set up -- the backend is set up
with shm_open(). The file created under /dev/shm (on Linux) might
require host RAM just the same as normal guest DRAM (especially with
memory overcommit disabled on the host), correct?

Thanks
Laszlo
Igor Mammedov Sept. 26, 2018, 12:10 p.m. UTC | #7
On Wed, 26 Sep 2018 13:35:14 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/26/18 13:12, Laszlo Ersek wrote:
> 
> > (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> > change is validated as part of the test suite, on SeaBIOS/q35.
> > 
> > Regarding (4):
> > 
> > - is it OK if I add the test only for Q35?
> > 
> > - what guest RAM size am I allowed to use in the test suite? In my own
> > SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> > acceptable for the test suite.  
> 
> And, even if the patch's effect can be shown with little guest DRAM, the
> test case still requires a multi-gig ivshmem-plain device. In
> "tests/ivshmem-test.c", I see how it is set up -- the backend is set up
> with shm_open(). The file created under /dev/shm (on Linux) might
> require host RAM just the same as normal guest DRAM (especially with
> memory overcommit disabled on the host), correct?
with over commit disable or cgroups limits enforced (I'd expect that
in automated testing env i.e. travis or something else)
allocating such amount of RAM probably would fail like crazy.

Maybe using memdev file backend with manually created sparse file
might actually work (with preallocate disabled)

> 
> Thanks
> Laszlo
>
Alex Williamson Sept. 26, 2018, 4:26 p.m. UTC | #8
On Wed, 26 Sep 2018 13:12:47 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/25/18 22:36, Alex Williamson wrote:
> > On Tue, 25 Sep 2018 00:13:46 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> >> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> >> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> >> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> >> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> >>
> >> In that commit, we added or modified five functions:
> >>
> >> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
> >>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
> >>   the DIMM hotplug area too (if any).
> >>
> >> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
> >>   board-specific 64-bit base property getters called abstractly by the
> >>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
> >>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
> >>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
> >>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
> >>   GPA programmed by the firmware as the base address for the aperture).
> >>
> >> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
> >>   these intended to extend the aperture to our size recommendation,
> >>   calculated relative to the base of the aperture.
> >>
> >> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> >> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> >> to the default base (pc_pci_hole64_start()), ignoring any programming done
> >> by the firmware. This means that our size recommendation may not be met.
> >> Fix it by honoring the firmware's address assignments.
> >>
> >> The strange extension sizes were spotted by Alex, in the log of a guest
> >> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> >>
> >> This change only affects DSDT generation, therefore no new compat property
> >> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> >> 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
> > 
> > This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> > only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> > src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> > assigned GPUs and you'll eventually cross that threshold and all 64-bit
> > BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> > devices could do the same.  Thanks,  
> 
> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
> 
> However, using SeaBIOS+i440fx, I can't show the difference. I've been
> experimenting with various ivshmem devices (even multiple at the same
> time, with different sizes). The "all or nothing" nature of SeaBIOS's
> high allocation of the 64-bit BARs, combined with hugepage alignment
> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
> for i440fx, seem to make it surprisingly difficult to trigger the issue.
> 
> I figure I should:
> 
> (1) remove the sentence "the patch makes no difference to SeaBIOS
> guests" from the commit message,
> 
> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
> 
> (3) remain silent on SeaBIOS/i440fx, in the commit message,
> 
> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> change is validated as part of the test suite, on SeaBIOS/q35.
> 
> Regarding (4):
> 
> - is it OK if I add the test only for Q35?
> 
> - what guest RAM size am I allowed to use in the test suite? In my own
> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> acceptable for the test suite.

Seems like you've done due diligence, the plan looks ok to me.
Regarding the test memory allocation, is it possible and reasonable to
perhaps create a 256MB shared memory area and re-use it for multiple
ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
devices, all with the same backing.  Thanks,

Alex
Laszlo Ersek Sept. 26, 2018, 8:26 p.m. UTC | #9
(+Eric)

On 09/26/18 14:10, Igor Mammedov wrote:
> On Wed, 26 Sep 2018 13:35:14 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 09/26/18 13:12, Laszlo Ersek wrote:
>>
>>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>>> change is validated as part of the test suite, on SeaBIOS/q35.
>>>
>>> Regarding (4):
>>>
>>> - is it OK if I add the test only for Q35?
>>>
>>> - what guest RAM size am I allowed to use in the test suite? In my own
>>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>>> acceptable for the test suite.  
>>
>> And, even if the patch's effect can be shown with little guest DRAM, the
>> test case still requires a multi-gig ivshmem-plain device. In
>> "tests/ivshmem-test.c", I see how it is set up -- the backend is set up
>> with shm_open(). The file created under /dev/shm (on Linux) might
>> require host RAM just the same as normal guest DRAM (especially with
>> memory overcommit disabled on the host), correct?
> with over commit disable or cgroups limits enforced (I'd expect that
> in automated testing env i.e. travis or something else)
> allocating such amount of RAM probably would fail like crazy.
> 
> Maybe using memdev file backend with manually created sparse file
> might actually work (with preallocate disabled)

Thanks, this sounds like a good idea.

I see shm_open() is used heavily in ivshmem-related tests. I haven't
looked much at shm_open() before. (I've always known it existed in
POSIX, but I've never cared.)

So now I first checked what shm_open() would give me over a regular
temporary file created with open(); after all, the file descriptor
returned by either would have to be mmap()'d. From the rationale in POSIX:

<http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_08_14>,

it seems like the {shm_open(), mmap()} combo has two significant
guarantees over {open(), mmap()}:

- the namespace may be distinct (there need not be a writeable
  filesystem at all),

- the shared object will *always* be locked in RAM ("Shared memory is
  not just simply providing common access to data, it is providing the
  fastest possible communication between the processes").

The rationale seems to permit, on purpose, an shm_open() implementation
that is actually based on open(), using a special file system -- and
AIUI, /dev/shm is just that, on Linux.

Eric, does the above sound more or less correct?

If it is correct, then I think shm_open() is exactly what I *don't* want
for this use case. Because, while I do need a pathname for an
mmap()-able object (regular file, or otherwise), just so I can do:

  -object memory-backend-file,id=mem-obj,...,mem-path=... \
  -device ivshmem-plain,memdev=mem-obj,...

, I want the underlying object to put as little pressure on the system
that runs the test suite as possible.

This means I should specifically ask for a regular file, to be mmap()'d
(with MAP_SHARED). Then the kernel knows in advance that it can always
page out the dirty stuff, and the mapping shouldn't clash with cgroups,
or disabled memory overcommit.

Now, in order to make that actually safe, I should in theory ask for
preallocation on the filesystem (otherwise, if the filesystem runs out
of space, while the kernel is allocating fs extents in order to flush
the dirty pages to them, the process gets a SIGBUS, IIRC). However,
because I know that nothing will be in fact dirtied, I can minimize the
footprint on the filesystem as well, and forego preallocation too.

This suggests that, in my test case,
- I call g_file_open_tmp() for creating the temporary file,
- pass the returned fd to ftruncate() for resizing the temporary file,
- pass the returned pathname to the "memory-backend-file" objects, in
  the "mem-path" property,
- set "share=on",
- set "prealloc=off",
- "discard-data" is irrelevant (there won't be any dirty pages).

Thanks
Laszlo
Laszlo Ersek Sept. 26, 2018, 8:35 p.m. UTC | #10
(+Eric)

On 09/26/18 18:26, Alex Williamson wrote:
> On Wed, 26 Sep 2018 13:12:47 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 09/25/18 22:36, Alex Williamson wrote:
>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>   
>>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>>>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>>>
>>>> In that commit, we added or modified five functions:
>>>>
>>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>>>   the DIMM hotplug area too (if any).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>>>   board-specific 64-bit base property getters called abstractly by the
>>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>>>   GPA programmed by the firmware as the base address for the aperture).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>>>   these intended to extend the aperture to our size recommendation,
>>>>   calculated relative to the base of the aperture.
>>>>
>>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>>>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>>>> by the firmware. This means that our size recommendation may not be met.
>>>> Fix it by honoring the firmware's address assignments.
>>>>
>>>> The strange extension sizes were spotted by Alex, in the log of a guest
>>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>>>
>>>> This change only affects DSDT generation, therefore no new compat property
>>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
>>>
>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>> devices could do the same.  Thanks,  
>>
>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>
>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>> experimenting with various ivshmem devices (even multiple at the same
>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>> high allocation of the 64-bit BARs, combined with hugepage alignment
>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>
>> I figure I should:
>>
>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>> guests" from the commit message,
>>
>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>
>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>
>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>> change is validated as part of the test suite, on SeaBIOS/q35.
>>
>> Regarding (4):
>>
>> - is it OK if I add the test only for Q35?
>>
>> - what guest RAM size am I allowed to use in the test suite? In my own
>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>> acceptable for the test suite.
> 
> Seems like you've done due diligence, the plan looks ok to me.
> Regarding the test memory allocation, is it possible and reasonable to
> perhaps create a 256MB shared memory area and re-use it for multiple
> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> devices, all with the same backing.  Thanks,

This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
required, to allocate filesystem extents when increasing the size of a
file. Using one smaller regular temporary file as the common foundation
for multiple "memory-backend-file" objects will save space on the fs if
ftruncate() happens to allocate extents.

(I've also thought of passing the same "memory-backend-file" object to
multiple ivshmem-plain devices, but ivshmem_plain_realize()
[hw/misc/ivshmem.c] checks whether the HostMemoryBackend is already mapped.)

Thanks!
Laszlo
Laszlo Ersek Sept. 26, 2018, 10:31 p.m. UTC | #11
On 09/26/18 18:26, Alex Williamson wrote:
> On Wed, 26 Sep 2018 13:12:47 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 09/25/18 22:36, Alex Williamson wrote:
>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>   
>>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>>>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>>>
>>>> In that commit, we added or modified five functions:
>>>>
>>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>>>   the DIMM hotplug area too (if any).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>>>   board-specific 64-bit base property getters called abstractly by the
>>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>>>   GPA programmed by the firmware as the base address for the aperture).
>>>>
>>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>>>   these intended to extend the aperture to our size recommendation,
>>>>   calculated relative to the base of the aperture.
>>>>
>>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>>>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>>>> by the firmware. This means that our size recommendation may not be met.
>>>> Fix it by honoring the firmware's address assignments.
>>>>
>>>> The strange extension sizes were spotted by Alex, in the log of a guest
>>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>>>
>>>> This change only affects DSDT generation, therefore no new compat property
>>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.  
>>>
>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>> devices could do the same.  Thanks,  
>>
>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>
>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>> experimenting with various ivshmem devices (even multiple at the same
>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>> high allocation of the 64-bit BARs, combined with hugepage alignment
>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>
>> I figure I should:
>>
>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>> guests" from the commit message,
>>
>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>
>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>
>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>> change is validated as part of the test suite, on SeaBIOS/q35.
>>
>> Regarding (4):
>>
>> - is it OK if I add the test only for Q35?
>>
>> - what guest RAM size am I allowed to use in the test suite? In my own
>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>> acceptable for the test suite.
> 
> Seems like you've done due diligence, the plan looks ok to me.
> Regarding the test memory allocation, is it possible and reasonable to
> perhaps create a 256MB shared memory area and re-use it for multiple
> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> devices, all with the same backing.  Thanks,

In order to show that the patch makes a difference, on SeaBIOS+Q35, I
must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
/ end-of-DIMM-hotplug-area (i.e., strictly past the return value of
pc_pci_hole64_start()). That's not easy, assuming test suite constraints:

- With a small guest RAM size (and no DIMM hotplug area),
pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
too. Therefore the patch makes no difference.

- If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
adding a DIMM hotplug area, it's not helpful. Because the end of the
DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
alignment automatically. Thus, the BARs remain aligned.

- If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
and the BARs remain aligned.

- If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
guest RAM) for the ivshmem BAR size by changing the BAR size, then I
must pick a 8GB BAR size at least. And then I can't use the 256MB
"fragmentation".

- I guess I could ascertain the mis-alignment by using small guest RAM
(128MB), a single DIMM hotplug slot so that reserved-memory-end is
rounded up to 5GB (through the 1GB alignment), and a single ivshmem
device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
Is this the best compromise?

(The last option is basically an optimization of my previous reproducer,
namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
would end at an *odd* GB number -- we can optimize that by using an
empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
optimized to the smallest *even* (positive) number of GBs, namely 2GB.
But we can't do anything about the pervasive GB-alignment.)

Thanks
Laszlo
Gerd Hoffmann Sept. 27, 2018, 5:48 a.m. UTC | #12
Hi,

> > Maybe using memdev file backend with manually created sparse file
> > might actually work (with preallocate disabled)
> 
> Thanks, this sounds like a good idea.
> 
> I see shm_open() is used heavily in ivshmem-related tests. I haven't
> looked much at shm_open() before. (I've always known it existed in
> POSIX, but I've never cared.)

How about improving the lovely pci-testdev we have a bit?  Then we can
have huge pci bars without needing backing storage for them ...

HTH,
  Gerd

=========================== cut here ===========================
From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu, 27 Sep 2018 07:43:10 +0200
Subject: [PATCH] pci-testdev: add optional memory bar

Add memory bar to pci-testdev.  Size is configurable using the membar
property.  Setting the size to zero (default) turns it off.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 hw/misc/pci-testdev.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
index 32041f535f..af4d678ee4 100644
--- a/hw/misc/pci-testdev.c
+++ b/hw/misc/pci-testdev.c
@@ -85,6 +85,9 @@ typedef struct PCITestDevState {
     MemoryRegion portio;
     IOTest *tests;
     int current;
+
+    size_t membar_size;
+    MemoryRegion membar;
 } PCITestDevState;
 
 #define TYPE_PCI_TEST_DEV "pci-testdev"
@@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, Error **errp)
     pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &d->mmio);
     pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->portio);
 
+    if (d->membar_size) {
+        memory_region_init(&d->membar, OBJECT(d), "membar", d->membar_size);
+        pci_register_bar(pci_dev, 2,
+                         PCI_BASE_ADDRESS_SPACE_MEMORY |
+                         PCI_BASE_ADDRESS_MEM_PREFETCH |
+                         PCI_BASE_ADDRESS_MEM_TYPE_64,
+                         &d->membar);
+    }
+
     d->current = -1;
     d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
     for (i = 0; i < IOTEST_MAX; ++i) {
@@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
     pci_testdev_reset(d);
 }
 
+static Property pci_testdev_properties[] = {
+    DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void pci_testdev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, void *data)
     dc->desc = "PCI Test Device";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->reset = qdev_pci_testdev_reset;
+    dc->props = pci_testdev_properties;
 }
 
 static const TypeInfo pci_testdev_info = {
Igor Mammedov Sept. 27, 2018, 7:03 a.m. UTC | #13
On Thu, 27 Sep 2018 00:31:07 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 09/26/18 18:26, Alex Williamson wrote:
> > On Wed, 26 Sep 2018 13:12:47 +0200
> > Laszlo Ersek <lersek@redhat.com> wrote:
> >   
> >> On 09/25/18 22:36, Alex Williamson wrote:  
> >>> On Tue, 25 Sep 2018 00:13:46 +0200
> >>> Laszlo Ersek <lersek@redhat.com> wrote:
> >>>     
> >>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
> >>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
> >>>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
> >>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
> >>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
> >>>>
> >>>> In that commit, we added or modified five functions:
> >>>>
> >>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
> >>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
> >>>>   the DIMM hotplug area too (if any).
> >>>>
> >>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
> >>>>   board-specific 64-bit base property getters called abstractly by the
> >>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
> >>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
> >>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
> >>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
> >>>>   GPA programmed by the firmware as the base address for the aperture).
> >>>>
> >>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
> >>>>   these intended to extend the aperture to our size recommendation,
> >>>>   calculated relative to the base of the aperture.
> >>>>
> >>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
> >>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
> >>>> to the default base (pc_pci_hole64_start()), ignoring any programming done
> >>>> by the firmware. This means that our size recommendation may not be met.
> >>>> Fix it by honoring the firmware's address assignments.
> >>>>
> >>>> The strange extension sizes were spotted by Alex, in the log of a guest
> >>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
> >>>>
> >>>> This change only affects DSDT generation, therefore no new compat property
> >>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
> >>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.    
> >>>
> >>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
> >>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
> >>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
> >>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
> >>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
> >>> devices could do the same.  Thanks,    
> >>
> >> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
> >> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
> >>
> >> However, using SeaBIOS+i440fx, I can't show the difference. I've been
> >> experimenting with various ivshmem devices (even multiple at the same
> >> time, with different sizes). The "all or nothing" nature of SeaBIOS's
> >> high allocation of the 64-bit BARs, combined with hugepage alignment
> >> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
> >> for i440fx, seem to make it surprisingly difficult to trigger the issue.
> >>
> >> I figure I should:
> >>
> >> (1) remove the sentence "the patch makes no difference to SeaBIOS
> >> guests" from the commit message,
> >>
> >> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
> >>
> >> (3) remain silent on SeaBIOS/i440fx, in the commit message,
> >>
> >> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
> >> change is validated as part of the test suite, on SeaBIOS/q35.
> >>
> >> Regarding (4):
> >>
> >> - is it OK if I add the test only for Q35?
> >>
> >> - what guest RAM size am I allowed to use in the test suite? In my own
> >> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
> >> acceptable for the test suite.  
> > 
> > Seems like you've done due diligence, the plan looks ok to me.
> > Regarding the test memory allocation, is it possible and reasonable to
> > perhaps create a 256MB shared memory area and re-use it for multiple
> > ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
> > devices, all with the same backing.  Thanks,  
> 
> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
> pc_pci_hole64_start()). That's not easy, assuming test suite constraints:
> 
> - With a small guest RAM size (and no DIMM hotplug area),
> pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
> for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
> too. Therefore the patch makes no difference.
> 
> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
> adding a DIMM hotplug area, it's not helpful. Because the end of the
> DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
> automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
> alignment automatically. Thus, the BARs remain aligned.
> 
> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
> using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
> Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
> bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
> and the BARs remain aligned.
> 
> - If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
> guest RAM) for the ivshmem BAR size by changing the BAR size, then I
> must pick a 8GB BAR size at least. And then I can't use the 256MB
> "fragmentation".
> 
> - I guess I could ascertain the mis-alignment by using small guest RAM
> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
> Is this the best compromise?
> 
> (The last option is basically an optimization of my previous reproducer,
> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
> would end at an *odd* GB number -- we can optimize that by using an
> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
that probably would do the job

> But we can't do anything about the pervasive GB-alignment.)
I haven't read the code to be sure but at first glance, thanks to
brokenness before, it might be possible to mis-align pci_hole64 start
if one would use a combination of
 broken_reserved_end=true && enforce_aligned_dimm=false

That's assuming that an old machine type that has this combo is still
suitable for testcase otherwise.

> 
> Thanks
> Laszlo
Laszlo Ersek Sept. 27, 2018, 9:11 a.m. UTC | #14
On 09/27/18 09:03, Igor Mammedov wrote:
> On Thu, 27 Sep 2018 00:31:07 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 09/26/18 18:26, Alex Williamson wrote:
>>> On Wed, 26 Sep 2018 13:12:47 +0200
>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>   
>>>> On 09/25/18 22:36, Alex Williamson wrote:  
>>>>> On Tue, 25 Sep 2018 00:13:46 +0200
>>>>> Laszlo Ersek <lersek@redhat.com> wrote:
>>>>>     
>>>>>> In commit 9fa99d2519cb ("hw/pci-host: Fix x86 Host Bridges 64bit PCI
>>>>>> hole", 2017-11-16), we meant to expose such a 64-bit PCI MMIO aperture in
>>>>>> the ACPI DSDT that would be at least as large as the new "pci-hole64-size"
>>>>>> property (2GB on i440fx, 32GB on q35). The goal was to offer "enough"
>>>>>> 64-bit MMIO aperture to the guest OS for hotplug purposes.
>>>>>>
>>>>>> In that commit, we added or modified five functions:
>>>>>>
>>>>>> - pc_pci_hole64_start(): shared between i440fx and q35. Provides a default
>>>>>>   64-bit base, which starts beyond the cold-plugged 64-bit RAM, and skips
>>>>>>   the DIMM hotplug area too (if any).
>>>>>>
>>>>>> - i440fx_pcihost_get_pci_hole64_start(), q35_host_get_pci_hole64_start():
>>>>>>   board-specific 64-bit base property getters called abstractly by the
>>>>>>   ACPI generator. Both of these fall back to pc_pci_hole64_start() if the
>>>>>>   firmware didn't program any 64-bit hole (i.e. if the firmware didn't
>>>>>>   assign a 64-bit GPA to any MMIO BAR on any device). Otherwise, they
>>>>>>   honor the firmware's BAR assignments (i.e., they treat the lowest 64-bit
>>>>>>   GPA programmed by the firmware as the base address for the aperture).
>>>>>>
>>>>>> - i440fx_pcihost_get_pci_hole64_end(), q35_host_get_pci_hole64_end():
>>>>>>   these intended to extend the aperture to our size recommendation,
>>>>>>   calculated relative to the base of the aperture.
>>>>>>
>>>>>> Despite the original intent, i440fx_pcihost_get_pci_hole64_end() and
>>>>>> q35_host_get_pci_hole64_end() currently only extend the aperture relative
>>>>>> to the default base (pc_pci_hole64_start()), ignoring any programming done
>>>>>> by the firmware. This means that our size recommendation may not be met.
>>>>>> Fix it by honoring the firmware's address assignments.
>>>>>>
>>>>>> The strange extension sizes were spotted by Alex, in the log of a guest
>>>>>> kernel running on top of OVMF (which does assign 64-bit GPAs to BARs).
>>>>>>
>>>>>> This change only affects DSDT generation, therefore no new compat property
>>>>>> is being introduced. Also, because SeaBIOS never assigns 64-bit GPAs to
>>>>>> 64-bit BARs, the patch makes no difference to SeaBIOS guests.    
>>>>>
>>>>> This is not exactly true, SeaBIOS will make use of 64-bit MMIO, but
>>>>> only if it cannot satisfy all the BARs from 32-bit MMMIO, see
>>>>> src/fw/pciinit.c:pci_bios_map_devices.  Create a VM with several
>>>>> assigned GPUs and you'll eventually cross that threshold and all 64-bit
>>>>> BARs will be moved above 4G.  I'm sure a few sufficiently sized ivshmem
>>>>> devices could do the same.  Thanks,    
>>>>
>>>> The effect of this patch is not hard to demonstrate with SeaBIOS+Q35,
>>>> when using e.g. 5GB of guest RAM and a 4GB ivshmem-plain device.
>>>>
>>>> However, using SeaBIOS+i440fx, I can't show the difference. I've been
>>>> experimenting with various ivshmem devices (even multiple at the same
>>>> time, with different sizes). The "all or nothing" nature of SeaBIOS's
>>>> high allocation of the 64-bit BARs, combined with hugepage alignment
>>>> inside SeaBIOS, combined with the small (2GB) rounding size used in QEMU
>>>> for i440fx, seem to make it surprisingly difficult to trigger the issue.
>>>>
>>>> I figure I should:
>>>>
>>>> (1) remove the sentence "the patch makes no difference to SeaBIOS
>>>> guests" from the commit message,
>>>>
>>>> (2) include the DSDT diff on SeaBIOS/q35 in the commit message,
>>>>
>>>> (3) remain silent on SeaBIOS/i440fx, in the commit message,
>>>>
>>>> (4) append a new patch, for "bios-tables-test", so that the ACPI gen
>>>> change is validated as part of the test suite, on SeaBIOS/q35.
>>>>
>>>> Regarding (4):
>>>>
>>>> - is it OK if I add the test only for Q35?
>>>>
>>>> - what guest RAM size am I allowed to use in the test suite? In my own
>>>> SeaBIOS/Q35 reproducer I currently use 5GB, but I'm not sure if it's
>>>> acceptable for the test suite.  
>>>
>>> Seems like you've done due diligence, the plan looks ok to me.
>>> Regarding the test memory allocation, is it possible and reasonable to
>>> perhaps create a 256MB shared memory area and re-use it for multiple
>>> ivshmem devices?  ie. rather than 1, 4GB ivshmem device, use 16, 256MB
>>> devices, all with the same backing.  Thanks,  
>>
>> In order to show that the patch makes a difference, on SeaBIOS+Q35, I
>> must have SeaBIOS place the lowest 64-bit BAR strictly above end-of-RAM
>> / end-of-DIMM-hotplug-area (i.e., strictly past the return value of
>> pc_pci_hole64_start()). That's not easy, assuming test suite constraints:
>>
>> - With a small guest RAM size (and no DIMM hotplug area),
>> pc_pci_hole64_start() returns 4GB. Unfortunately, 4GB is well aligned
>> for 256MB BARs, so that's where SeaBIOS will place the lowest such BAR
>> too. Therefore the patch makes no difference.
>>
>> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size, by
>> adding a DIMM hotplug area, it's not helpful. Because the end of the
>> DIMM area ("etc/reserved-memory-end") is rounded up to 1GB alignment
>> automatically. Similarly, pc_pci_hole64_start() is rounded up to 1GB
>> alignment automatically. Thus, the BARs remain aligned.
>>
>> - If I try to mis-align pc_pci_hole64_start() for the 256MB BAR size by
>> using more guest RAM (e.g. 2049 MB --> the low RAM split is at 2GB on
>> Q35, so RAM would end at 4GB+1MB), then it's the same as the previous
>> bullet, because pc_pci_hole64_start() is rounded up to 1GB alignment,
>> and the BARs remain aligned.
>>
>> - If I try to mis-align pc_pci_hole64_start() (again, 4GB, with small
>> guest RAM) for the ivshmem BAR size by changing the BAR size, then I
>> must pick a 8GB BAR size at least. And then I can't use the 256MB
>> "fragmentation".
>>
>> - I guess I could ascertain the mis-alignment by using small guest RAM
>> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
>> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
>> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
>> Is this the best compromise?
>>
>> (The last option is basically an optimization of my previous reproducer,
>> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
>> would end at an *odd* GB number -- we can optimize that by using an
>> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
>> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
> that probably would do the job

OK, thanks.

>> But we can't do anything about the pervasive GB-alignment.)
> I haven't read the code to be sure but at first glance, thanks to
> brokenness before, it might be possible to mis-align pci_hole64 start
> if one would use a combination of
>  broken_reserved_end=true && enforce_aligned_dimm=false
> 
> That's assuming that an old machine type that has this combo is still
> suitable for testcase otherwise.

I thought of messing with manual compat property overrides (or old
machine types), but I'd prefer not to go there. The code I'm patching
already depends on a compat property ("x-pci-hole64-fix", for both
"i440FX-pcihost" and "q35-pcihost"), and that one is set to "off" up to
and including PC_COMPAT_2_10. Whereas the latest machine types (i440fx
and q35) where "broken_reserved_end" is true are 2.4. Which leaves me
with an empty set.

I'll start work on the "best compromise" method.

Thanks!
Laszlo
Laszlo Ersek Sept. 27, 2018, 9:21 a.m. UTC | #15
On 09/27/18 07:48, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Maybe using memdev file backend with manually created sparse file
>>> might actually work (with preallocate disabled)
>>
>> Thanks, this sounds like a good idea.
>>
>> I see shm_open() is used heavily in ivshmem-related tests. I haven't
>> looked much at shm_open() before. (I've always known it existed in
>> POSIX, but I've never cared.)
> 
> How about improving the lovely pci-testdev we have a bit?  Then we can
> have huge pci bars without needing backing storage for them ...

Earlier I checked

  qemu-system-x86_64 -device pci-testdev,\?

briefly. I didn't see anything relevant and figured it was out of scope.

(
Actually, I've thought of yet another thing: the resource reservation
capability for PCIe Root Ports. The 64-bit reservation field in that cap
structure should be perfectly suitable for this testing.

Except, of course, SeaBIOS only handles the bus number reservation
field, at this time. :/
)

Looking at "docs/specs/pci-testdev.txt" now, extending the PCI testdev
looks like a separate project to me. We're getting quite far from the
original goal. I totally agree that if the PCI testdev already had this
ability, it would be a perfect fit, but I don't think I can personally
take that on now.

Thanks
Laszlo

> =========================== cut here ===========================
> From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
> From: Gerd Hoffmann <kraxel@redhat.com>
> Date: Thu, 27 Sep 2018 07:43:10 +0200
> Subject: [PATCH] pci-testdev: add optional memory bar
> 
> Add memory bar to pci-testdev.  Size is configurable using the membar
> property.  Setting the size to zero (default) turns it off.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  hw/misc/pci-testdev.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
> index 32041f535f..af4d678ee4 100644
> --- a/hw/misc/pci-testdev.c
> +++ b/hw/misc/pci-testdev.c
> @@ -85,6 +85,9 @@ typedef struct PCITestDevState {
>      MemoryRegion portio;
>      IOTest *tests;
>      int current;
> +
> +    size_t membar_size;
> +    MemoryRegion membar;
>  } PCITestDevState;
>  
>  #define TYPE_PCI_TEST_DEV "pci-testdev"
> @@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, Error **errp)
>      pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &d->mmio);
>      pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->portio);
>  
> +    if (d->membar_size) {
> +        memory_region_init(&d->membar, OBJECT(d), "membar", d->membar_size);
> +        pci_register_bar(pci_dev, 2,
> +                         PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                         PCI_BASE_ADDRESS_MEM_PREFETCH |
> +                         PCI_BASE_ADDRESS_MEM_TYPE_64,
> +                         &d->membar);
> +    }
> +
>      d->current = -1;
>      d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
>      for (i = 0; i < IOTEST_MAX; ++i) {
> @@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
>      pci_testdev_reset(d);
>  }
>  
> +static Property pci_testdev_properties[] = {
> +    DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
>  static void pci_testdev_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, void *data)
>      dc->desc = "PCI Test Device";
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      dc->reset = qdev_pci_testdev_reset;
> +    dc->props = pci_testdev_properties;
>  }
>  
>  static const TypeInfo pci_testdev_info = {
>
Laszlo Ersek Sept. 27, 2018, 9:27 a.m. UTC | #16
On 09/27/18 11:21, Laszlo Ersek wrote:
> On 09/27/18 07:48, Gerd Hoffmann wrote:
>>   Hi,
>>
>>>> Maybe using memdev file backend with manually created sparse file
>>>> might actually work (with preallocate disabled)
>>>
>>> Thanks, this sounds like a good idea.
>>>
>>> I see shm_open() is used heavily in ivshmem-related tests. I haven't
>>> looked much at shm_open() before. (I've always known it existed in
>>> POSIX, but I've never cared.)
>>
>> How about improving the lovely pci-testdev we have a bit?  Then we can
>> have huge pci bars without needing backing storage for them ...
> 
> Earlier I checked
> 
>   qemu-system-x86_64 -device pci-testdev,\?
> 
> briefly. I didn't see anything relevant and figured it was out of scope.
> 
> (
> Actually, I've thought of yet another thing: the resource reservation
> capability for PCIe Root Ports. The 64-bit reservation field in that cap
> structure should be perfectly suitable for this testing.
> 
> Except, of course, SeaBIOS only handles the bus number reservation
> field, at this time. :/
> )
> 
> Looking at "docs/specs/pci-testdev.txt" now, extending the PCI testdev
> looks like a separate project to me. We're getting quite far from the
> original goal. I totally agree that if the PCI testdev already had this
> ability, it would be a perfect fit, but I don't think I can personally
> take that on now.

I realize you pasted a prototype patch below (thanks for that!), but it
doesn't update the documentation. I wouldn't like to dig down another
design rabbit hole here.

If you can pursue upstreaming this change, I can definitely delay my
series, and use the enhanced pci testdev as a reproducer. But, I'm not
explicitly asking you to pursue it -- none of this classifies as a
"serious bug", so these patches might not be the best uses of our times.

Laszlo

>> =========================== cut here ===========================
>> From 05cfced149b0b5c953391666c3151034bc7fe88b Mon Sep 17 00:00:00 2001
>> From: Gerd Hoffmann <kraxel@redhat.com>
>> Date: Thu, 27 Sep 2018 07:43:10 +0200
>> Subject: [PATCH] pci-testdev: add optional memory bar
>>
>> Add memory bar to pci-testdev.  Size is configurable using the membar
>> property.  Setting the size to zero (default) turns it off.
>>
>> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>> ---
>>  hw/misc/pci-testdev.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/hw/misc/pci-testdev.c b/hw/misc/pci-testdev.c
>> index 32041f535f..af4d678ee4 100644
>> --- a/hw/misc/pci-testdev.c
>> +++ b/hw/misc/pci-testdev.c
>> @@ -85,6 +85,9 @@ typedef struct PCITestDevState {
>>      MemoryRegion portio;
>>      IOTest *tests;
>>      int current;
>> +
>> +    size_t membar_size;
>> +    MemoryRegion membar;
>>  } PCITestDevState;
>>  
>>  #define TYPE_PCI_TEST_DEV "pci-testdev"
>> @@ -253,6 +256,15 @@ static void pci_testdev_realize(PCIDevice *pci_dev, Error **errp)
>>      pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &d->mmio);
>>      pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_IO, &d->portio);
>>  
>> +    if (d->membar_size) {
>> +        memory_region_init(&d->membar, OBJECT(d), "membar", d->membar_size);
>> +        pci_register_bar(pci_dev, 2,
>> +                         PCI_BASE_ADDRESS_SPACE_MEMORY |
>> +                         PCI_BASE_ADDRESS_MEM_PREFETCH |
>> +                         PCI_BASE_ADDRESS_MEM_TYPE_64,
>> +                         &d->membar);
>> +    }
>> +
>>      d->current = -1;
>>      d->tests = g_malloc0(IOTEST_MAX * sizeof *d->tests);
>>      for (i = 0; i < IOTEST_MAX; ++i) {
>> @@ -305,6 +317,11 @@ static void qdev_pci_testdev_reset(DeviceState *dev)
>>      pci_testdev_reset(d);
>>  }
>>  
>> +static Property pci_testdev_properties[] = {
>> +    DEFINE_PROP_SIZE("membar", PCITestDevState, membar_size, 0),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>>  static void pci_testdev_class_init(ObjectClass *klass, void *data)
>>  {
>>      DeviceClass *dc = DEVICE_CLASS(klass);
>> @@ -319,6 +336,7 @@ static void pci_testdev_class_init(ObjectClass *klass, void *data)
>>      dc->desc = "PCI Test Device";
>>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>      dc->reset = qdev_pci_testdev_reset;
>> +    dc->props = pci_testdev_properties;
>>  }
>>  
>>  static const TypeInfo pci_testdev_info = {
>>
>
Gerd Hoffmann Sept. 27, 2018, 12:13 p.m. UTC | #17
Hi,

> I realize you pasted a prototype patch below (thanks for that!), but it
> doesn't update the documentation. I wouldn't like to dig down another
> design rabbit hole here.
> 
> If you can pursue upstreaming this change, I can definitely delay my
> series, and use the enhanced pci testdev as a reproducer.

Patch sent.

> But, I'm not explicitly asking you to pursue it -- none of this
> classifies as a "serious bug", so these patches might not be the best
> uses of our times.

Well, jumping through loops to get the tests going with ivshmem + memory
overcommit isn't that a great use of our times either ...

cheers,
  Gerd
Eric Blake Sept. 27, 2018, 2:50 p.m. UTC | #18
On 9/27/18 4:21 AM, Laszlo Ersek wrote:
> On 09/27/18 07:48, Gerd Hoffmann wrote:
>>    Hi,
>>
>>>> Maybe using memdev file backend with manually created sparse file
>>>> might actually work (with preallocate disabled)
>>>
>>> Thanks, this sounds like a good idea.
>>>
>>> I see shm_open() is used heavily in ivshmem-related tests. I haven't
>>> looked much at shm_open() before. (I've always known it existed in
>>> POSIX, but I've never cared.)
>>
>> How about improving the lovely pci-testdev we have a bit?  Then we can
>> have huge pci bars without needing backing storage for them ...
> 
> Earlier I checked
> 
>    qemu-system-x86_64 -device pci-testdev,\?

Side note: While that works, we are trying to shift the documentation 
over to recommending -device pci-testdev,help instead, as then you don't 
need to remember to shell-quote the ? to prevent unintended globbing.
Eric Blake Sept. 27, 2018, 3:15 p.m. UTC | #19
On 9/26/18 3:26 PM, Laszlo Ersek wrote:
> (+Eric)
> 

> I see shm_open() is used heavily in ivshmem-related tests. I haven't
> looked much at shm_open() before. (I've always known it existed in
> POSIX, but I've never cared.)

I've never actually played with shm_open() myself, but understand the 
theory of it enough to reply.

> 
> So now I first checked what shm_open() would give me over a regular
> temporary file created with open(); after all, the file descriptor
> returned by either would have to be mmap()'d. From the rationale in POSIX:
> 
> <http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xsh_chap02.html#tag_22_02_08_14>,
> 
> it seems like the {shm_open(), mmap()} combo has two significant
> guarantees over {open(), mmap()}:
> 
> - the namespace may be distinct (there need not be a writeable
>    filesystem at all),
> 
> - the shared object will *always* be locked in RAM ("Shared memory is
>    not just simply providing common access to data, it is providing the
>    fastest possible communication between the processes").
> 
> The rationale seems to permit, on purpose, an shm_open() implementation
> that is actually based on open(), using a special file system -- and
> AIUI, /dev/shm is just that, on Linux.
> 
> Eric, does the above sound more or less correct?

You're right about it being permitted to be a distinct namespace; on the 
other hand, it doesn't even have to be a special file system.  An 
implementation could even use a compile-time fixed directory name 
visible to the rest of the system (although of course you shouldn't rely 
on being able to use the file system to poke at the shmem objects, nor 
should you manipulate the file system underneath the reserved directory 
behind shmem's back if that is what the implementation is using).  So 
I'm less certain of whether you are guaranteed that the shared memory 
has to be locked in place (where it can never be paged out), since an 
implementation on top of the filesystem does not have to do such locking 
- but you are also right that a high quality-of-implementation will 
strive to keep the memory live rather than paging it out precisely 
because it is used for interprocess communication that would be 
penalized if it can be paged out.

> 
> If it is correct, then I think shm_open() is exactly what I *don't* want
> for this use case. Because, while I do need a pathname for an
> mmap()-able object (regular file, or otherwise), just so I can do:
> 
>    -object memory-backend-file,id=mem-obj,...,mem-path=... \
>    -device ivshmem-plain,memdev=mem-obj,...
> 
> , I want the underlying object to put as little pressure on the system
> that runs the test suite as possible.
> 
> This means I should specifically ask for a regular file, to be mmap()'d
> (with MAP_SHARED). Then the kernel knows in advance that it can always
> page out the dirty stuff, and the mapping shouldn't clash with cgroups,
> or disabled memory overcommit.

Indeed, shmem CAN be a thin veneer on top of the file system, and 
support being paged out; but since an implementation that pins the 
memory such that it cannot page is permitted (and in fact maybe 
desirable), you are right that using shmem can indeed put pressure on 
different resources in relation to what you can accomplish by using the 
file system yourself.

> 
> Now, in order to make that actually safe, I should in theory ask for
> preallocation on the filesystem (otherwise, if the filesystem runs out
> of space, while the kernel is allocating fs extents in order to flush
> the dirty pages to them, the process gets a SIGBUS, IIRC). However,
> because I know that nothing will be in fact dirtied, I can minimize the
> footprint on the filesystem as well, and forego preallocation too.
> 
> This suggests that, in my test case,
> - I call g_file_open_tmp() for creating the temporary file,
> - pass the returned fd to ftruncate() for resizing the temporary file,
> - pass the returned pathname to the "memory-backend-file" objects, in
>    the "mem-path" property,
> - set "share=on",
> - set "prealloc=off",
> - "discard-data" is irrelevant (there won't be any dirty pages).
> 
> Thanks
> Laszlo
>
Eric Blake Sept. 27, 2018, 3:24 p.m. UTC | #20
On 9/26/18 3:35 PM, Laszlo Ersek wrote:
> (+Eric)

> This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
> required, to allocate filesystem extents when increasing the size of a
> file. Using one smaller regular temporary file as the common foundation
> for multiple "memory-backend-file" objects will save space on the fs if
> ftruncate() happens to allocate extents.

Correct. Most likely, ftruncate() extends by creating a hole if the file 
system supports sparse files, but POSIX has no way to guarantee holes 
(it merely states that both ftruncate() and write() beyond EOF are the 
two POSIX interfaces that may create holes - but that it is up to the 
implementation whether holes actually get created, then fstat() and 
lseek(SEEK_HOLE) can inform you whether holes were actually created [the 
former when st_blocks < st_size]). posix_fallocate() exists to force 
allocation, and Linux fallocate() offers even more fine-tuning when you 
need specific hole or non-hole behaviors.
Laszlo Ersek Sept. 27, 2018, 4:26 p.m. UTC | #21
On 09/27/18 17:24, Eric Blake wrote:
> On 9/26/18 3:35 PM, Laszlo Ersek wrote:
>> (+Eric)
> 
>> This too sounds useful. AIUI, ftruncate() is neither forbidden, nor
>> required, to allocate filesystem extents when increasing the size of a
>> file. Using one smaller regular temporary file as the common foundation
>> for multiple "memory-backend-file" objects will save space on the fs if
>> ftruncate() happens to allocate extents.
> 
> Correct. Most likely, ftruncate() extends by creating a hole if the file
> system supports sparse files, but POSIX has no way to guarantee holes
> (it merely states that both ftruncate() and write() beyond EOF are the
> two POSIX interfaces that may create holes - but that it is up to the
> implementation whether holes actually get created, then fstat() and
> lseek(SEEK_HOLE) can inform you whether holes were actually created [the
> former when st_blocks < st_size]). posix_fallocate() exists to force
> allocation, and Linux fallocate() offers even more fine-tuning when you
> need specific hole or non-hole behaviors.
> 

Thank you for the answers!
Laszlo
Laszlo Ersek Sept. 27, 2018, 7:05 p.m. UTC | #22
On 09/27/18 09:03, Igor Mammedov wrote:
> On Thu, 27 Sep 2018 00:31:07 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:

>> - I guess I could ascertain the mis-alignment by using small guest RAM
>> (128MB), a single DIMM hotplug slot so that reserved-memory-end is
>> rounded up to 5GB (through the 1GB alignment), and a single ivshmem
>> device with a 2GB BAR (mis-aligned with the reserved-memory-end at 5GB).
>> Is this the best compromise?
>>
>> (The last option is basically an optimization of my previous reproducer,
>> namely 5GB guest RAM and 4GB BAR. The 5GB guest RAM made sure that RAM
>> would end at an *odd* GB number -- we can optimize that by using an
>> empty DIMM hotplug area rather than guest RAM. And the 4GB BAR can be
>> optimized to the smallest *even* (positive) number of GBs, namely 2GB.
> that probably would do the job

There's some black magic going on in the sizing of the reserved memory
area, due to at least commit 085f8e88ba73 ("pc: count in 1Gb hugepage
alignment when sizing hotplug-memory container", 2014-11-24). At this
point I'm basically trying to find whatever numbers that (a) don't take
up many resources, (b) reproduce the issue. :)

Thanks
Laszlo
diff mbox series

Patch

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 0df91e002076..fb4b0669ac9f 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -285,7 +285,7 @@  static void i440fx_pcihost_get_pci_hole64_end(Object *obj, Visitor *v,
 {
     PCIHostState *h = PCI_HOST_BRIDGE(obj);
     I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
-    uint64_t hole64_start = pc_pci_hole64_start();
+    uint64_t hole64_start = i440fx_pcihost_get_pci_hole64_start_value(obj);
     Range w64;
     uint64_t value, hole64_end;
 
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 8acf942b5e65..1c5433161f14 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -145,7 +145,7 @@  static void q35_host_get_pci_hole64_end(Object *obj, Visitor *v,
 {
     PCIHostState *h = PCI_HOST_BRIDGE(obj);
     Q35PCIHost *s = Q35_HOST_DEVICE(obj);
-    uint64_t hole64_start = pc_pci_hole64_start();
+    uint64_t hole64_start = q35_host_get_pci_hole64_start_value(obj);
     Range w64;
     uint64_t value, hole64_end;