[RFC,3/3] hw/acpi: Extract build_mcfg

Message ID	20190313044253.31988-4-richardw.yang@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Wei Yang <richardw.yang@linux.intel.com> To: qemu-devel@nongnu.org, qemu-arm@nongnu.org Date: Wed, 13 Mar 2019 12:42:53 +0800 Message-Id: <20190313044253.31988-4-richardw.yang@linux.intel.com> In-Reply-To: <20190313044253.31988-1-richardw.yang@linux.intel.com> References: <20190313044253.31988-1-richardw.yang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [RFC PATCH 3/3] hw/acpi: Extract build_mcfg Precedence: list Cc: peter.maydell@linaro.org, mst@redhat.com, shannon.zhaosl@gmail.com, Wei Yang <richardw.yang@linux.intel.com>, imammedo@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	Extract build_mcfg \| expand [RFC,0/3] Extract build_mcfg [RFC,1/3] hw/arm/virt-acpi-build: use acpi_get_mcfg() to calculate bus number [RFC,2/3] hw/arm/virt-acpi-build: remove unnecessary variable mcfg_start [RFC,3/3] hw/acpi: Extract build_mcfg

Wei Yang March 13, 2019, 4:42 a.m. UTC

Now we have two identical build_mcfg function.

Extract them to aml-build.c.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c    | 16 ----------------
 hw/i386/acpi-build.c        | 31 +------------------------------
 include/hw/acpi/aml-build.h |  1 +
 4 files changed, 32 insertions(+), 46 deletions(-)

Igor Mammedov March 13, 2019, 12:23 p.m. UTC | #1

On Wed, 13 Mar 2019 12:42:53 +0800
Wei Yang <richardw.yang@linux.intel.com> wrote:

> Now we have two identical build_mcfg function.
> 
> Extract them to aml-build.c.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>  hw/arm/virt-acpi-build.c    | 16 ----------------
>  hw/i386/acpi-build.c        | 31 +------------------------------
>  include/hw/acpi/aml-build.h |  1 +
>  4 files changed, 32 insertions(+), 46 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 555c24f21d..58d3b8f31d 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c

I don't like polluting aml-build.c with PCI stuff,
we have a lot of PCI related code that needs generalizing
lets create a new file for that, something like
hw/acpi/pci.c + include/hw/acpi/pci.h

> @@ -25,6 +25,7 @@
>  #include "qemu/bswap.h"
>  #include "qemu/bitops.h"
>  #include "sysemu/numa.h"
> +#include "hw/pci/pcie_host.h"
>  
>  static GArray *build_alloc_array(void)
>  {
> @@ -1870,3 +1871,32 @@ build_hdr:
>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
>  }
> +
> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> +{
> +    AcpiTableMcfg *mcfg;
> +    const char *sig;
> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
> +
> +    mcfg = acpi_data_push(table_data, len);
> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> +    /* Only a single allocation so no need to play with segments */
> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> +    mcfg->allocation[0].start_bus_number = 0;
> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);

> +    /*
> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
> +     * To avoid table size changes (which create migration issues),
> +     * always create the table even if there are no allocations,
> +     * but set the signature to a reserved value in this case.
> +     * ACPI spec requires OSPMs to ignore such tables.
> +     */
> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> +        /* Reserved signature: ignored by OSPM */
> +        sig = "QEMU";
> +    } else {
> +        sig = "MCFG";
> +    }
I'd leave these hack at acpi-build.c, just push it up call chain.
More over we don't really need it since resizeable memory region was introduced.

So we need to keep table_blob size only for legacy usecase (pre resizable)
and for that just padding table_blob on required size would be sufficient,
there is no need to create dummy QEMU table.
As for newer machines (since resizeable memory region) we don't need to
do even that i.e. just skip table generation altogether if guest disabled it.

> +    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
> +}
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index ae7858a79a..92d8fccb00 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -557,22 +557,6 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>      return true;
>  }
>  
> -static void
> -build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> -{
> -    AcpiTableMcfg *mcfg;
> -    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
> -
> -    mcfg = acpi_data_push(table_data, len);
> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> -    /* Only a single allocation so no need to play with segments */
> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> -    mcfg->allocation[0].start_bus_number = 0;
> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
> -
> -    build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1, NULL, NULL);
> -}
> -
>  /* GTDT */
>  static void
>  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index c5b1c3be99..b537a39d42 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2392,35 +2392,6 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>                   table_data->len - srat_start, 1, NULL, NULL);
>  }
>  
> -static void
> -build_mcfg_q35(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> -{
> -    AcpiTableMcfg *mcfg;
> -    const char *sig;
> -    int len = sizeof(*mcfg) + 1 * sizeof(mcfg->allocation[0]);
> -
> -    mcfg = acpi_data_push(table_data, len);
> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> -    /* Only a single allocation so no need to play with segments */
> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> -    mcfg->allocation[0].start_bus_number = 0;
> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
> -
> -    /* MCFG is used for ECAM which can be enabled or disabled by guest.
> -     * To avoid table size changes (which create migration issues),
> -     * always create the table even if there are no allocations,
> -     * but set the signature to a reserved value in this case.
> -     * ACPI spec requires OSPMs to ignore such tables.
> -     */
> -    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> -        /* Reserved signature: ignored by OSPM */
> -        sig = "QEMU";
> -    } else {
> -        sig = "MCFG";
> -    }
> -    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
> -}
> -
>  /*
>   * VT-d spec 8.1 DMA Remapping Reporting Structure
>   * (version Oct. 2014 or later)
> @@ -2687,7 +2658,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>      }
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);
> -        build_mcfg_q35(tables_blob, tables->linker, &mcfg);
> +        build_mcfg(tables_blob, tables->linker, &mcfg);
>      }
>      if (x86_iommu_get_default()) {
>          IommuType IOMMUType = x86_iommu_get_type();
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index b63b85d67c..8f2ea3679f 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -423,4 +423,5 @@ void build_slit(GArray *table_data, BIOSLinker *linker);
>  
>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>                  const char *oem_id, const char *oem_table_id);
> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info);
>  #endif

Wei Yang March 13, 2019, 1:33 p.m. UTC | #2

On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:
>On Wed, 13 Mar 2019 12:42:53 +0800
>Wei Yang <richardw.yang@linux.intel.com> wrote:
>
>> Now we have two identical build_mcfg function.
>> 
>> Extract them to aml-build.c.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> ---
>>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>>  hw/arm/virt-acpi-build.c    | 16 ----------------
>>  hw/i386/acpi-build.c        | 31 +------------------------------
>>  include/hw/acpi/aml-build.h |  1 +
>>  4 files changed, 32 insertions(+), 46 deletions(-)
>> 
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 555c24f21d..58d3b8f31d 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>
>I don't like polluting aml-build.c with PCI stuff,
>we have a lot of PCI related code that needs generalizing
>lets create a new file for that, something like
>hw/acpi/pci.c + include/hw/acpi/pci.h
>
>> @@ -25,6 +25,7 @@
>>  #include "qemu/bswap.h"
>>  #include "qemu/bitops.h"
>>  #include "sysemu/numa.h"
>> +#include "hw/pci/pcie_host.h"
>>  
>>  static GArray *build_alloc_array(void)
>>  {
>> @@ -1870,3 +1871,32 @@ build_hdr:
>>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
>>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
>>  }
>> +
>> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
>> +{
>> +    AcpiTableMcfg *mcfg;
>> +    const char *sig;
>> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
>> +
>> +    mcfg = acpi_data_push(table_data, len);
>> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
>> +    /* Only a single allocation so no need to play with segments */
>> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
>> +    mcfg->allocation[0].start_bus_number = 0;
>> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
>
>> +    /*
>> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
>> +     * To avoid table size changes (which create migration issues),
>> +     * always create the table even if there are no allocations,
>> +     * but set the signature to a reserved value in this case.
>> +     * ACPI spec requires OSPMs to ignore such tables.
>> +     */
>> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
>> +        /* Reserved signature: ignored by OSPM */
>> +        sig = "QEMU";
>> +    } else {
>> +        sig = "MCFG";
>> +    }
>I'd leave these hack at acpi-build.c, just push it up call chain.

Assign sig in acpi-build.c and pass it to build_mcfg()?

>More over we don't really need it since resizeable memory region was introduced.
>
>So we need to keep table_blob size only for legacy usecase (pre resizable)
>and for that just padding table_blob on required size would be sufficient,
>there is no need to create dummy QEMU table.
>As for newer machines (since resizeable memory region) we don't need to
>do even that i.e. just skip table generation altogether if guest disabled it.
>

I am lost at this place.

sig is a part of ACPI table header, you mean the sig is not necessary to
be set in ACPI table header?

"skip table generation" means remove build_header() in build_mcfg()?

>> +    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
>> +}
>> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
>> index ae7858a79a..92d8fccb00 100644
>> --- a/hw/arm/virt-acpi-build.c
>> +++ b/hw/arm/virt-acpi-build.c
>> @@ -557,22 +557,6 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>>      return true;
>>  }
>>  
>> -static void
>> -build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
>> -{
>> -    AcpiTableMcfg *mcfg;
>> -    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
>> -
>> -    mcfg = acpi_data_push(table_data, len);
>> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
>> -    /* Only a single allocation so no need to play with segments */
>> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
>> -    mcfg->allocation[0].start_bus_number = 0;
>> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
>> -
>> -    build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1, NULL, NULL);
>> -}
>> -
>>  /* GTDT */
>>  static void
>>  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index c5b1c3be99..b537a39d42 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2392,35 +2392,6 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
>>                   table_data->len - srat_start, 1, NULL, NULL);
>>  }
>>  
>> -static void
>> -build_mcfg_q35(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
>> -{
>> -    AcpiTableMcfg *mcfg;
>> -    const char *sig;
>> -    int len = sizeof(*mcfg) + 1 * sizeof(mcfg->allocation[0]);
>> -
>> -    mcfg = acpi_data_push(table_data, len);
>> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
>> -    /* Only a single allocation so no need to play with segments */
>> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
>> -    mcfg->allocation[0].start_bus_number = 0;
>> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
>> -
>> -    /* MCFG is used for ECAM which can be enabled or disabled by guest.
>> -     * To avoid table size changes (which create migration issues),
>> -     * always create the table even if there are no allocations,
>> -     * but set the signature to a reserved value in this case.
>> -     * ACPI spec requires OSPMs to ignore such tables.
>> -     */
>> -    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
>> -        /* Reserved signature: ignored by OSPM */
>> -        sig = "QEMU";
>> -    } else {
>> -        sig = "MCFG";
>> -    }
>> -    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
>> -}
>> -
>>  /*
>>   * VT-d spec 8.1 DMA Remapping Reporting Structure
>>   * (version Oct. 2014 or later)
>> @@ -2687,7 +2658,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>      }
>>      if (acpi_get_mcfg(&mcfg)) {
>>          acpi_add_table(table_offsets, tables_blob);
>> -        build_mcfg_q35(tables_blob, tables->linker, &mcfg);
>> +        build_mcfg(tables_blob, tables->linker, &mcfg);
>>      }
>>      if (x86_iommu_get_default()) {
>>          IommuType IOMMUType = x86_iommu_get_type();
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index b63b85d67c..8f2ea3679f 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -423,4 +423,5 @@ void build_slit(GArray *table_data, BIOSLinker *linker);
>>  
>>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
>>                  const char *oem_id, const char *oem_table_id);
>> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info);
>>  #endif
>

Igor Mammedov March 13, 2019, 4:09 p.m. UTC | #3

On Wed, 13 Mar 2019 13:33:59 +0000
Wei Yang <richard.weiyang@gmail.com> wrote:

> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:
> >On Wed, 13 Mar 2019 12:42:53 +0800
> >Wei Yang <richardw.yang@linux.intel.com> wrote:
> >  
> >> Now we have two identical build_mcfg function.
> >> 
> >> Extract them to aml-build.c.
> >> 
> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> ---
> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
> >>  hw/i386/acpi-build.c        | 31 +------------------------------
> >>  include/hw/acpi/aml-build.h |  1 +
> >>  4 files changed, 32 insertions(+), 46 deletions(-)
> >> 
> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >> index 555c24f21d..58d3b8f31d 100644
> >> --- a/hw/acpi/aml-build.c
> >> +++ b/hw/acpi/aml-build.c  
> >
> >I don't like polluting aml-build.c with PCI stuff,
> >we have a lot of PCI related code that needs generalizing
> >lets create a new file for that, something like
> >hw/acpi/pci.c + include/hw/acpi/pci.h
> >  
> >> @@ -25,6 +25,7 @@
> >>  #include "qemu/bswap.h"
> >>  #include "qemu/bitops.h"
> >>  #include "sysemu/numa.h"
> >> +#include "hw/pci/pcie_host.h"
> >>  
> >>  static GArray *build_alloc_array(void)
> >>  {
> >> @@ -1870,3 +1871,32 @@ build_hdr:
> >>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
> >>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
> >>  }
> >> +
> >> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> >> +{
> >> +    AcpiTableMcfg *mcfg;
> >> +    const char *sig;
> >> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
> >> +
> >> +    mcfg = acpi_data_push(table_data, len);
> >> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> >> +    /* Only a single allocation so no need to play with segments */
> >> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> >> +    mcfg->allocation[0].start_bus_number = 0;
> >> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);  
> >  
> >> +    /*
> >> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
> >> +     * To avoid table size changes (which create migration issues),
> >> +     * always create the table even if there are no allocations,
> >> +     * but set the signature to a reserved value in this case.
> >> +     * ACPI spec requires OSPMs to ignore such tables.
> >> +     */
> >> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> >> +        /* Reserved signature: ignored by OSPM */
> >> +        sig = "QEMU";
> >> +    } else {
> >> +        sig = "MCFG";
> >> +    }  
> >I'd leave these hack at acpi-build.c, just push it up call chain.  
> 
> Assign sig in acpi-build.c and pass it to build_mcfg()?
nope, see more below

 
> >More over we don't really need it since resizeable memory region was introduced.
> >
> >So we need to keep table_blob size only for legacy usecase (pre resizable)
> >and for that just padding table_blob on required size would be sufficient,
> >there is no need to create dummy QEMU table.
> >As for newer machines (since resizeable memory region) we don't need to
> >do even that i.e. just skip table generation altogether if guest disabled it.
> >  
> 
> I am lost at this place.
> 
> sig is a part of ACPI table header, you mean the sig is not necessary to
> be set in ACPI table header?
> 
> "skip table generation" means remove build_header() in build_mcfg()?
I mean do not call build_mcfg() at all when you don't have to.

And when you need to keep table_blob the same size (for old machines)
using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
might just work as well. it's still hack but it can live in x86 specific
acpi_build() keeping build_mcfg() generic.

As for defining what to use as criteria to decide when we need to keep
table_blob size the same, I don't remember history of it, so I'd suggest
to look at commit a1666142, study history of acpi_ram_update() and
legacy_acpi_table_size to figure out since which machine type one doesn't
have to keep table_blob size the same.


> 
> >> +    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
> >> +}
> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> >> index ae7858a79a..92d8fccb00 100644
> >> --- a/hw/arm/virt-acpi-build.c
> >> +++ b/hw/arm/virt-acpi-build.c
> >> @@ -557,22 +557,6 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
> >>      return true;
> >>  }
> >>  
> >> -static void
> >> -build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> >> -{
> >> -    AcpiTableMcfg *mcfg;
> >> -    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
> >> -
> >> -    mcfg = acpi_data_push(table_data, len);
> >> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> >> -    /* Only a single allocation so no need to play with segments */
> >> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> >> -    mcfg->allocation[0].start_bus_number = 0;
> >> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
> >> -
> >> -    build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1, NULL, NULL);
> >> -}
> >> -
> >>  /* GTDT */
> >>  static void
> >>  build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> >> index c5b1c3be99..b537a39d42 100644
> >> --- a/hw/i386/acpi-build.c
> >> +++ b/hw/i386/acpi-build.c
> >> @@ -2392,35 +2392,6 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
> >>                   table_data->len - srat_start, 1, NULL, NULL);
> >>  }
> >>  
> >> -static void
> >> -build_mcfg_q35(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> >> -{
> >> -    AcpiTableMcfg *mcfg;
> >> -    const char *sig;
> >> -    int len = sizeof(*mcfg) + 1 * sizeof(mcfg->allocation[0]);
> >> -
> >> -    mcfg = acpi_data_push(table_data, len);
> >> -    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> >> -    /* Only a single allocation so no need to play with segments */
> >> -    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> >> -    mcfg->allocation[0].start_bus_number = 0;
> >> -    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);
> >> -
> >> -    /* MCFG is used for ECAM which can be enabled or disabled by guest.
> >> -     * To avoid table size changes (which create migration issues),
> >> -     * always create the table even if there are no allocations,
> >> -     * but set the signature to a reserved value in this case.
> >> -     * ACPI spec requires OSPMs to ignore such tables.
> >> -     */
> >> -    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> >> -        /* Reserved signature: ignored by OSPM */
> >> -        sig = "QEMU";
> >> -    } else {
> >> -        sig = "MCFG";
> >> -    }
> >> -    build_header(linker, table_data, (void *)mcfg, sig, len, 1, NULL, NULL);
> >> -}
> >> -
> >>  /*
> >>   * VT-d spec 8.1 DMA Remapping Reporting Structure
> >>   * (version Oct. 2014 or later)
> >> @@ -2687,7 +2658,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
> >>      }
> >>      if (acpi_get_mcfg(&mcfg)) {
> >>          acpi_add_table(table_offsets, tables_blob);
> >> -        build_mcfg_q35(tables_blob, tables->linker, &mcfg);
> >> +        build_mcfg(tables_blob, tables->linker, &mcfg);
> >>      }
> >>      if (x86_iommu_get_default()) {
> >>          IommuType IOMMUType = x86_iommu_get_type();
> >> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> >> index b63b85d67c..8f2ea3679f 100644
> >> --- a/include/hw/acpi/aml-build.h
> >> +++ b/include/hw/acpi/aml-build.h
> >> @@ -423,4 +423,5 @@ void build_slit(GArray *table_data, BIOSLinker *linker);
> >>  
> >>  void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
> >>                  const char *oem_id, const char *oem_table_id);
> >> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info);
> >>  #endif  
> >  
>

Wei Yang March 13, 2019, 9:31 p.m. UTC | #4

On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
>On Wed, 13 Mar 2019 13:33:59 +0000
>Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:
>> >On Wed, 13 Mar 2019 12:42:53 +0800
>> >Wei Yang <richardw.yang@linux.intel.com> wrote:
>> >  
>> >> Now we have two identical build_mcfg function.
>> >> 
>> >> Extract them to aml-build.c.
>> >> 
>> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> ---
>> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
>> >>  hw/i386/acpi-build.c        | 31 +------------------------------
>> >>  include/hw/acpi/aml-build.h |  1 +
>> >>  4 files changed, 32 insertions(+), 46 deletions(-)
>> >> 
>> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> >> index 555c24f21d..58d3b8f31d 100644
>> >> --- a/hw/acpi/aml-build.c
>> >> +++ b/hw/acpi/aml-build.c  
>> >
>> >I don't like polluting aml-build.c with PCI stuff,
>> >we have a lot of PCI related code that needs generalizing
>> >lets create a new file for that, something like
>> >hw/acpi/pci.c + include/hw/acpi/pci.h
>> >  
>> >> @@ -25,6 +25,7 @@
>> >>  #include "qemu/bswap.h"
>> >>  #include "qemu/bitops.h"
>> >>  #include "sysemu/numa.h"
>> >> +#include "hw/pci/pcie_host.h"
>> >>  
>> >>  static GArray *build_alloc_array(void)
>> >>  {
>> >> @@ -1870,3 +1871,32 @@ build_hdr:
>> >>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
>> >>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
>> >>  }
>> >> +
>> >> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
>> >> +{
>> >> +    AcpiTableMcfg *mcfg;
>> >> +    const char *sig;
>> >> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
>> >> +
>> >> +    mcfg = acpi_data_push(table_data, len);
>> >> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
>> >> +    /* Only a single allocation so no need to play with segments */
>> >> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
>> >> +    mcfg->allocation[0].start_bus_number = 0;
>> >> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);  
>> >  
>> >> +    /*
>> >> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
>> >> +     * To avoid table size changes (which create migration issues),
>> >> +     * always create the table even if there are no allocations,
>> >> +     * but set the signature to a reserved value in this case.
>> >> +     * ACPI spec requires OSPMs to ignore such tables.
>> >> +     */
>> >> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
>> >> +        /* Reserved signature: ignored by OSPM */
>> >> +        sig = "QEMU";
>> >> +    } else {
>> >> +        sig = "MCFG";
>> >> +    }  
>> >I'd leave these hack at acpi-build.c, just push it up call chain.  
>> 
>> Assign sig in acpi-build.c and pass it to build_mcfg()?
>nope, see more below
>
> 
>> >More over we don't really need it since resizeable memory region was introduced.
>> >
>> >So we need to keep table_blob size only for legacy usecase (pre resizable)
>> >and for that just padding table_blob on required size would be sufficient,
>> >there is no need to create dummy QEMU table.
>> >As for newer machines (since resizeable memory region) we don't need to
>> >do even that i.e. just skip table generation altogether if guest disabled it.
>> >  
>> 
>> I am lost at this place.
>> 
>> sig is a part of ACPI table header, you mean the sig is not necessary to
>> be set in ACPI table header?
>> 
>> "skip table generation" means remove build_header() in build_mcfg()?
>I mean do not call build_mcfg() at all when you don't have to.
>
>And when you need to keep table_blob the same size (for old machines)
>using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
>might just work as well. it's still hack but it can live in x86 specific
>acpi_build() keeping build_mcfg() generic.
>

Seems got your idea.

>As for defining what to use as criteria to decide when we need to keep
>table_blob size the same, I don't remember history of it, so I'd suggest
>to look at commit a1666142, study history of acpi_ram_update() and
>legacy_acpi_table_size to figure out since which machine type one doesn't
>have to keep table_blob size the same.
>

OK, let me study the history first.

BTW, the legacy here is hardware specification level or qemu software
design level?

Igor Mammedov March 14, 2019, 9:18 a.m. UTC | #5

On Wed, 13 Mar 2019 21:31:37 +0000
Wei Yang <richard.weiyang@gmail.com> wrote:

> On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
> >On Wed, 13 Mar 2019 13:33:59 +0000
> >Wei Yang <richard.weiyang@gmail.com> wrote:
> >  
> >> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:  
> >> >On Wed, 13 Mar 2019 12:42:53 +0800
> >> >Wei Yang <richardw.yang@linux.intel.com> wrote:
> >> >    
> >> >> Now we have two identical build_mcfg function.
> >> >> 
> >> >> Extract them to aml-build.c.
> >> >> 
> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> >> >> ---
> >> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
> >> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
> >> >>  hw/i386/acpi-build.c        | 31 +------------------------------
> >> >>  include/hw/acpi/aml-build.h |  1 +
> >> >>  4 files changed, 32 insertions(+), 46 deletions(-)
> >> >> 
> >> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> >> >> index 555c24f21d..58d3b8f31d 100644
> >> >> --- a/hw/acpi/aml-build.c
> >> >> +++ b/hw/acpi/aml-build.c    
> >> >
> >> >I don't like polluting aml-build.c with PCI stuff,
> >> >we have a lot of PCI related code that needs generalizing
> >> >lets create a new file for that, something like
> >> >hw/acpi/pci.c + include/hw/acpi/pci.h
> >> >    
> >> >> @@ -25,6 +25,7 @@
> >> >>  #include "qemu/bswap.h"
> >> >>  #include "qemu/bitops.h"
> >> >>  #include "sysemu/numa.h"
> >> >> +#include "hw/pci/pcie_host.h"
> >> >>  
> >> >>  static GArray *build_alloc_array(void)
> >> >>  {
> >> >> @@ -1870,3 +1871,32 @@ build_hdr:
> >> >>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
> >> >>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
> >> >>  }
> >> >> +
> >> >> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
> >> >> +{
> >> >> +    AcpiTableMcfg *mcfg;
> >> >> +    const char *sig;
> >> >> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
> >> >> +
> >> >> +    mcfg = acpi_data_push(table_data, len);
> >> >> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
> >> >> +    /* Only a single allocation so no need to play with segments */
> >> >> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
> >> >> +    mcfg->allocation[0].start_bus_number = 0;
> >> >> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);    
> >> >    
> >> >> +    /*
> >> >> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
> >> >> +     * To avoid table size changes (which create migration issues),
> >> >> +     * always create the table even if there are no allocations,
> >> >> +     * but set the signature to a reserved value in this case.
> >> >> +     * ACPI spec requires OSPMs to ignore such tables.
> >> >> +     */
> >> >> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
> >> >> +        /* Reserved signature: ignored by OSPM */
> >> >> +        sig = "QEMU";
> >> >> +    } else {
> >> >> +        sig = "MCFG";
> >> >> +    }    
> >> >I'd leave these hack at acpi-build.c, just push it up call chain.    
> >> 
> >> Assign sig in acpi-build.c and pass it to build_mcfg()?  
> >nope, see more below
> >
> >   
> >> >More over we don't really need it since resizeable memory region was introduced.
> >> >
> >> >So we need to keep table_blob size only for legacy usecase (pre resizable)
> >> >and for that just padding table_blob on required size would be sufficient,
> >> >there is no need to create dummy QEMU table.
> >> >As for newer machines (since resizeable memory region) we don't need to
> >> >do even that i.e. just skip table generation altogether if guest disabled it.
> >> >    
> >> 
> >> I am lost at this place.
> >> 
> >> sig is a part of ACPI table header, you mean the sig is not necessary to
> >> be set in ACPI table header?
> >> 
> >> "skip table generation" means remove build_header() in build_mcfg()?  
> >I mean do not call build_mcfg() at all when you don't have to.
> >
> >And when you need to keep table_blob the same size (for old machines)
> >using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
> >might just work as well. it's still hack but it can live in x86 specific
> >acpi_build() keeping build_mcfg() generic.
> >  
> 
> Seems got your idea.
> 
> >As for defining what to use as criteria to decide when we need to keep
> >table_blob size the same, I don't remember history of it, so I'd suggest
> >to look at commit a1666142, study history of acpi_ram_update() and
> >legacy_acpi_table_size to figure out since which machine type one doesn't
> >have to keep table_blob size the same.
> >  
> 
> OK, let me study the history first.
> 
> BTW, the legacy here is hardware specification level or qemu software
> design level?
it's QEMU only, you need to find a version of QEMU (machine type)
which didn't have re-sizable MemoryRegion and the next version most likely
would have a knob somewhere in machine class definition saying that we don't
care about sizes anymore or care about sizes only for previous machine types.

Wei Yang March 14, 2019, 11:54 a.m. UTC | #6

On Thu, Mar 14, 2019 at 10:18:30AM +0100, Igor Mammedov wrote:
>On Wed, 13 Mar 2019 21:31:37 +0000
>Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
>> >On Wed, 13 Mar 2019 13:33:59 +0000
>> >Wei Yang <richard.weiyang@gmail.com> wrote:
>> >  
>> >> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:  
>> >> >On Wed, 13 Mar 2019 12:42:53 +0800
>> >> >Wei Yang <richardw.yang@linux.intel.com> wrote:
>> >> >    
>> >> >> Now we have two identical build_mcfg function.
>> >> >> 
>> >> >> Extract them to aml-build.c.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >> ---
>> >> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>> >> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
>> >> >>  hw/i386/acpi-build.c        | 31 +------------------------------
>> >> >>  include/hw/acpi/aml-build.h |  1 +
>> >> >>  4 files changed, 32 insertions(+), 46 deletions(-)
>> >> >> 
>> >> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> >> >> index 555c24f21d..58d3b8f31d 100644
>> >> >> --- a/hw/acpi/aml-build.c
>> >> >> +++ b/hw/acpi/aml-build.c    
>> >> >
>> >> >I don't like polluting aml-build.c with PCI stuff,
>> >> >we have a lot of PCI related code that needs generalizing
>> >> >lets create a new file for that, something like
>> >> >hw/acpi/pci.c + include/hw/acpi/pci.h
>> >> >    
>> >> >> @@ -25,6 +25,7 @@
>> >> >>  #include "qemu/bswap.h"
>> >> >>  #include "qemu/bitops.h"
>> >> >>  #include "sysemu/numa.h"
>> >> >> +#include "hw/pci/pcie_host.h"
>> >> >>  
>> >> >>  static GArray *build_alloc_array(void)
>> >> >>  {
>> >> >> @@ -1870,3 +1871,32 @@ build_hdr:
>> >> >>      build_header(linker, tbl, (void *)(tbl->data + fadt_start),
>> >> >>                   "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
>> >> >>  }
>> >> >> +
>> >> >> +void build_mcfg(GArray *table_data, BIOSLinker *linker, AcpiMcfgInfo *info)
>> >> >> +{
>> >> >> +    AcpiTableMcfg *mcfg;
>> >> >> +    const char *sig;
>> >> >> +    int len = sizeof(*mcfg) + sizeof(mcfg->allocation[0]);
>> >> >> +
>> >> >> +    mcfg = acpi_data_push(table_data, len);
>> >> >> +    mcfg->allocation[0].address = cpu_to_le64(info->mcfg_base);
>> >> >> +    /* Only a single allocation so no need to play with segments */
>> >> >> +    mcfg->allocation[0].pci_segment = cpu_to_le16(0);
>> >> >> +    mcfg->allocation[0].start_bus_number = 0;
>> >> >> +    mcfg->allocation[0].end_bus_number = PCIE_MMCFG_BUS(info->mcfg_size - 1);    
>> >> >    
>> >> >> +    /*
>> >> >> +     * MCFG is used for ECAM which can be enabled or disabled by guest.
>> >> >> +     * To avoid table size changes (which create migration issues),
>> >> >> +     * always create the table even if there are no allocations,
>> >> >> +     * but set the signature to a reserved value in this case.
>> >> >> +     * ACPI spec requires OSPMs to ignore such tables.
>> >> >> +     */
>> >> >> +    if (info->mcfg_base == PCIE_BASE_ADDR_UNMAPPED) {
>> >> >> +        /* Reserved signature: ignored by OSPM */
>> >> >> +        sig = "QEMU";
>> >> >> +    } else {
>> >> >> +        sig = "MCFG";
>> >> >> +    }    
>> >> >I'd leave these hack at acpi-build.c, just push it up call chain.    
>> >> 
>> >> Assign sig in acpi-build.c and pass it to build_mcfg()?  
>> >nope, see more below
>> >
>> >   
>> >> >More over we don't really need it since resizeable memory region was introduced.
>> >> >
>> >> >So we need to keep table_blob size only for legacy usecase (pre resizable)
>> >> >and for that just padding table_blob on required size would be sufficient,
>> >> >there is no need to create dummy QEMU table.
>> >> >As for newer machines (since resizeable memory region) we don't need to
>> >> >do even that i.e. just skip table generation altogether if guest disabled it.
>> >> >    
>> >> 
>> >> I am lost at this place.
>> >> 
>> >> sig is a part of ACPI table header, you mean the sig is not necessary to
>> >> be set in ACPI table header?
>> >> 
>> >> "skip table generation" means remove build_header() in build_mcfg()?  
>> >I mean do not call build_mcfg() at all when you don't have to.
>> >
>> >And when you need to keep table_blob the same size (for old machines)
>> >using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
>> >might just work as well. it's still hack but it can live in x86 specific
>> >acpi_build() keeping build_mcfg() generic.
>> >  
>> 
>> Seems got your idea.
>> 
>> >As for defining what to use as criteria to decide when we need to keep
>> >table_blob size the same, I don't remember history of it, so I'd suggest
>> >to look at commit a1666142, study history of acpi_ram_update() and
>> >legacy_acpi_table_size to figure out since which machine type one doesn't
>> >have to keep table_blob size the same.
>> >  
>> 
>> OK, let me study the history first.
>> 
>> BTW, the legacy here is hardware specification level or qemu software
>> design level?
>it's QEMU only, you need to find a version of QEMU (machine type)
>which didn't have re-sizable MemoryRegion and the next version most likely
>would have a knob somewhere in machine class definition saying that we don't
>care about sizes anymore or care about sizes only for previous machine types.

Ok, let me find when we introduced re-sizable MemoryRegion.

Seems I still have some uncertain concepts. Let me discover it one by
one :-)

Wei Yang March 16, 2019, 10:31 a.m. UTC | #7

On Thu, Mar 14, 2019 at 10:18:30AM +0100, Igor Mammedov wrote:
>On Wed, 13 Mar 2019 21:31:37 +0000
>Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
>> >On Wed, 13 Mar 2019 13:33:59 +0000
>> >Wei Yang <richard.weiyang@gmail.com> wrote:
>> >  
>> >> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:  
>> >> >On Wed, 13 Mar 2019 12:42:53 +0800
>> >> >Wei Yang <richardw.yang@linux.intel.com> wrote:
>> >> >    
>> >> >> Now we have two identical build_mcfg function.
>> >> >> 
>> >> >> Extract them to aml-build.c.
>> >> >> 
>> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>> >> >> ---
>> >> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>> >> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
>> >> >>  hw/i386/acpi-build.c        | 31 +------------------------------
>> >> >>  include/hw/acpi/aml-build.h |  1 +
>> >> >>  4 files changed, 32 insertions(+), 46 deletions(-)
>> >> >> 
>> >> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> >> >> index 555c24f21d..58d3b8f31d 100644
>> >> >> --- a/hw/acpi/aml-build.c
>> >> >> +++ b/hw/acpi/aml-build.c    
>> >> >

[...]

>> >I mean do not call build_mcfg() at all when you don't have to.
>> >
>> >And when you need to keep table_blob the same size (for old machines)
>> >using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
>> >might just work as well. it's still hack but it can live in x86 specific
>> >acpi_build() keeping build_mcfg() generic.
>> >  
>> 
>> Seems got your idea.
>> 
>> >As for defining what to use as criteria to decide when we need to keep
>> >table_blob size the same, I don't remember history of it, so I'd suggest
>> >to look at commit a1666142, study history of acpi_ram_update() and
>> >legacy_acpi_table_size to figure out since which machine type one doesn't
>> >have to keep table_blob size the same.
>> >  
>> 
>> OK, let me study the history first.
>> 
>> BTW, the legacy here is hardware specification level or qemu software
>> design level?
>it's QEMU only, you need to find a version of QEMU (machine type)
>which didn't have re-sizable MemoryRegion and the next version most likely
>would have a knob somewhere in machine class definition saying that we don't
>care about sizes anymore or care about sizes only for previous machine types.

I have got something, but not sure this is correct. So I'd like to
summarize them here first.

The following commits are ordered in timeline.

1. First we introduced resizable MR.

    a1666142db 2015-01-08 acpi-build: make ROMs RAM blocks resizeable
    60786ef339 2015-01-08 memory: API to allocate resizeable RAM MR

2. Then acpi_ram_update in introduced

    339240b5cd 2015-04-27 acpi-build: remove dependency from ram_addr.h
    42d859001d 2015-02-26 acpi-build: fix ACPI RAM management

3. After that we introduce 2.4 machine type

    5cb50e0acc 2015-04-27 pc: add 2.4 machine types

But I don't find a comment stating we don't care about sizes anymore.

Below is my understanding of the dummy table history. 

Before resizable MR was introduced, the dummy table is used to be a
place holder so that we could add real table in acpi_build_update().
Othersize the new ACPI table is bigger to be placed into
build_state->table_mr.

If this is correct, after commit a1666142db ACPI table has the
capability to expand itself and not necessary to pre-allocate the dummy
table. Can I say after commit a1666142db, we don't need to put a fake
table in ACPI?

Another confusion is in the comment of build_mcfg(). In which aspect, it
relates to migration? Is this before resizable MR introduced?

Wei Yang March 20, 2019, 1:40 p.m. UTC | #8

On Sat, Mar 16, 2019 at 10:31:24AM +0000, Wei Yang wrote:
>On Thu, Mar 14, 2019 at 10:18:30AM +0100, Igor Mammedov wrote:
>>On Wed, 13 Mar 2019 21:31:37 +0000
>>Wei Yang <richard.weiyang@gmail.com> wrote:
>>
>>> On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
>>> >On Wed, 13 Mar 2019 13:33:59 +0000
>>> >Wei Yang <richard.weiyang@gmail.com> wrote:
>>> >  
>>> >> On Wed, Mar 13, 2019 at 01:23:00PM +0100, Igor Mammedov wrote:  
>>> >> >On Wed, 13 Mar 2019 12:42:53 +0800
>>> >> >Wei Yang <richardw.yang@linux.intel.com> wrote:
>>> >> >    
>>> >> >> Now we have two identical build_mcfg function.
>>> >> >> 
>>> >> >> Extract them to aml-build.c.
>>> >> >> 
>>> >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>>> >> >> ---
>>> >> >>  hw/acpi/aml-build.c         | 30 ++++++++++++++++++++++++++++++
>>> >> >>  hw/arm/virt-acpi-build.c    | 16 ----------------
>>> >> >>  hw/i386/acpi-build.c        | 31 +------------------------------
>>> >> >>  include/hw/acpi/aml-build.h |  1 +
>>> >> >>  4 files changed, 32 insertions(+), 46 deletions(-)
>>> >> >> 
>>> >> >> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>>> >> >> index 555c24f21d..58d3b8f31d 100644
>>> >> >> --- a/hw/acpi/aml-build.c
>>> >> >> +++ b/hw/acpi/aml-build.c    
>>> >> >
>
>[...]
>
>>> >I mean do not call build_mcfg() at all when you don't have to.
>>> >
>>> >And when you need to keep table_blob the same size (for old machines)
>>> >using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
>>> >might just work as well. it's still hack but it can live in x86 specific
>>> >acpi_build() keeping build_mcfg() generic.
>>> >  
>>> 
>>> Seems got your idea.
>>> 
>>> >As for defining what to use as criteria to decide when we need to keep
>>> >table_blob size the same, I don't remember history of it, so I'd suggest
>>> >to look at commit a1666142, study history of acpi_ram_update() and
>>> >legacy_acpi_table_size to figure out since which machine type one doesn't
>>> >have to keep table_blob size the same.
>>> >  
>>> 
>>> OK, let me study the history first.
>>> 
>>> BTW, the legacy here is hardware specification level or qemu software
>>> design level?
>>it's QEMU only, you need to find a version of QEMU (machine type)
>>which didn't have re-sizable MemoryRegion and the next version most likely
>>would have a knob somewhere in machine class definition saying that we don't
>>care about sizes anymore or care about sizes only for previous machine types.
>
>I have got something, but not sure this is correct. So I'd like to
>summarize them here first.
>
>The following commits are ordered in timeline.
>
>1. First we introduced resizable MR.
>
>    a1666142db 2015-01-08 acpi-build: make ROMs RAM blocks resizeable
>    60786ef339 2015-01-08 memory: API to allocate resizeable RAM MR
>
>2. Then acpi_ram_update in introduced
>
>    339240b5cd 2015-04-27 acpi-build: remove dependency from ram_addr.h
>    42d859001d 2015-02-26 acpi-build: fix ACPI RAM management
>
>3. After that we introduce 2.4 machine type
>
>    5cb50e0acc 2015-04-27 pc: add 2.4 machine types
>
>But I don't find a comment stating we don't care about sizes anymore.
>
>Below is my understanding of the dummy table history. 
>
>Before resizable MR was introduced, the dummy table is used to be a
>place holder so that we could add real table in acpi_build_update().
>Othersize the new ACPI table is bigger to be placed into
>build_state->table_mr.
>
>If this is correct, after commit a1666142db ACPI table has the
>capability to expand itself and not necessary to pre-allocate the dummy
>table. Can I say after commit a1666142db, we don't need to put a fake
>table in ACPI?
>
>Another confusion is in the comment of build_mcfg(). In which aspect, it
>relates to migration? Is this before resizable MR introduced?
>

Hi, Igor

I guess you may be busy with other stuff. I'd appreciate it a lot if you could
spare some time slot to take a look at this one. Thanks :-)

Have a great day~

>-- 
>Wei Yang
>Help you, Help me

Wei Yang April 2, 2019, 3:53 a.m. UTC | #9

On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
>On Wed, 13 Mar 2019 13:33:59 +0000
>Wei Yang <richard.weiyang@gmail.com> wrote:
>
>> 
>> I am lost at this place.
>> 
>> sig is a part of ACPI table header, you mean the sig is not necessary to
>> be set in ACPI table header?
>> 
>> "skip table generation" means remove build_header() in build_mcfg()?
>I mean do not call build_mcfg() at all when you don't have to.
>
>And when you need to keep table_blob the same size (for old machines)
>using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
>might just work as well. it's still hack but it can live in x86 specific
>acpi_build() keeping build_mcfg() generic.
>
>As for defining what to use as criteria to decide when we need to keep
>table_blob size the same, I don't remember history of it, so I'd suggest
>to look at commit a1666142, study history of acpi_ram_update() and
>legacy_acpi_table_size to figure out since which machine type one doesn't
>have to keep table_blob size the same.
>

Hi, Igor

It took me sometime to go through the migration infrastructure.

Before continuing, I'd like to talk about what I understand to make sure my
direction is correct.

ACPI has a structure named AcpiBuildState, which contains all related
information. During migration, those data in AcpiBuildState should be
transferred to destination, e.g. table_mr, rsdp_mr and link_mr.

In the case related to mcfg, the problem lies in table_mr. And the reason
breaking migration is the size of table_mr is different between source and
destination.(This reason is a guess from those change logs and mails.)

The migration infrastructure has several SaveStateEntry to help migrate
different elements. The one with name "ram" take charge of RAMBlock. So this
SaveStateEntry and its ops is the next step for me to investigate. And from
this to see the effect of different size MemoryRegion during migration.

Is this sounds correct?

Igor Mammedov April 2, 2019, 6:15 a.m. UTC | #10

On Tue, 2 Apr 2019 11:53:43 +0800
Wei Yang <richardw.yang@linux.intel.com> wrote:

> On Wed, Mar 13, 2019 at 05:09:43PM +0100, Igor Mammedov wrote:
> >On Wed, 13 Mar 2019 13:33:59 +0000
> >Wei Yang <richard.weiyang@gmail.com> wrote:
> >  
> >> 
> >> I am lost at this place.
> >> 
> >> sig is a part of ACPI table header, you mean the sig is not necessary to
> >> be set in ACPI table header?
> >> 
> >> "skip table generation" means remove build_header() in build_mcfg()?  
> >I mean do not call build_mcfg() at all when you don't have to.
> >
> >And when you need to keep table_blob the same size (for old machines)
> >using acpi_data_push() to reserve space instead of build_mcfg(sig="QEMU")
> >might just work as well. it's still hack but it can live in x86 specific
> >acpi_build() keeping build_mcfg() generic.
> >
> >As for defining what to use as criteria to decide when we need to keep
> >table_blob size the same, I don't remember history of it, so I'd suggest
> >to look at commit a1666142, study history of acpi_ram_update() and
> >legacy_acpi_table_size to figure out since which machine type one doesn't
> >have to keep table_blob size the same.
> >  
> 
> Hi, Igor
> 
> It took me sometime to go through the migration infrastructure.
> 
> Before continuing, I'd like to talk about what I understand to make sure my
> direction is correct.
> 
> ACPI has a structure named AcpiBuildState, which contains all related
> information. During migration, those data in AcpiBuildState should be
> transferred to destination, e.g. table_mr, rsdp_mr and link_mr.
> 
> In the case related to mcfg, the problem lies in table_mr. And the reason
> breaking migration is the size of table_mr is different between source and
> destination.(This reason is a guess from those change logs and mails.)
This is about right if I recall it correctly.
 
> The migration infrastructure has several SaveStateEntry to help migrate
> different elements. The one with name "ram" take charge of RAMBlock. So this
> SaveStateEntry and its ops is the next step for me to investigate. And from
> this to see the effect of different size MemoryRegion during migration.
I don't think that you need to dig in migration mechanics so deep.
For our purpose of finding QEMU&machine version where migration between
size mismatched MemoryRegions (or RAMBlocks) is fixed is sufficient.

Aside from trying to grasp how migration works internally, you can try
to simulate[1] problem and bisect it to a commit that fixed it.
It still not easy due to amount of combinations you'd need to try,
but it's probably much easier than trying to figure out issue just
by reading code.

1) to stimulate you need to recreate conditions for table_mr jumping
from initial padded size to the next padded size after adding bridge
so you'd have reproducer which makes table_mr differ in size.

If I recall correctly in that time conditions were created by large
amount hotpluggble CPUs (maxcpus - CLI option) and then addidng
PCI bridges.
I'd just hack QEMU to print table_mr size in acpi_build_update()
to find coldplug CLI where we jump to the next padded size and then
use it for bisection with bridge[s] hotplug on source and migrating that
without reboot to target where hotplugged bridges are on CLI (one has to
configure target so it would have all devices that source has including
hotplugged ones)

> Is this sounds correct?
>

Wei Yang April 5, 2019, 8:55 a.m. UTC | #11

On Tue, Apr 02, 2019 at 08:15:12AM +0200, Igor Mammedov wrote:
>On Tue, 2 Apr 2019 11:53:43 +0800
>Wei Yang <richardw.yang@linux.intel.com> wrote:
>
> 
>> The migration infrastructure has several SaveStateEntry to help migrate
>> different elements. The one with name "ram" take charge of RAMBlock. So this
>> SaveStateEntry and its ops is the next step for me to investigate. And from
>> this to see the effect of different size MemoryRegion during migration.
>I don't think that you need to dig in migration mechanics so deep.
>For our purpose of finding QEMU&machine version where migration between
>size mismatched MemoryRegions (or RAMBlocks) is fixed is sufficient.
>

ok.

>Aside from trying to grasp how migration works internally, you can try
>to simulate[1] problem and bisect it to a commit that fixed it.
>It still not easy due to amount of combinations you'd need to try,
>but it's probably much easier than trying to figure out issue just
>by reading code.
>
>1) to stimulate you need to recreate conditions for table_mr jumping
>from initial padded size to the next padded size after adding bridge
>so you'd have reproducer which makes table_mr differ in size.
>
>If I recall correctly in that time conditions were created by large
>amount hotpluggble CPUs (maxcpus - CLI option) and then addidng
>PCI bridges.
>I'd just hack QEMU to print table_mr size in acpi_build_update()
>to find coldplug CLI where we jump to the next padded size and then
>use it for bisection with bridge[s] hotplug on source and migrating that
>without reboot to target where hotplugged bridges are on CLI (one has to
>configure target so it would have all devices that source has including
>hotplugged ones)
>

Igor,

I got some confusion on how to re-produce this case. To be specific, how to 
expand table_mr from initial padded size to next padded size.

Let's see a normal hotplug case first. 

    I did one test to see the how a guest with hot-plug cpu migrate to
    destination.  It looks the migration fails if I just do hot-plug at
    source. So I have to do hot-plug both at source and at destination. This
    will expand the table_mr both at source and destination.

Then let's see the effect of hotplug more devices to exceed original padded
size. There are two cases, before re-sizable MemoryRegion and after.

1) Before re-sizable MemoryRegion introduced

    Before re-sizable MemoryRegion introduced, we just pad table_mr to 4K. And
    this size never gets bigger, if I am right. To be accurate, the table_blob
    would grow to next padded size if we hot-add more cpus/pci bridge, but we
    just copy the original size of MemoryRegion. Even without migration, the
    ACPI table is corrupted when we expand to next padded size.

    Is my understanding correct here?

2) After re-sizable MemoryRegion introduced

    This time both tabl_blob and MemoryRegion grows when it expand to next
    padded size. Since we need to hot-add device at both side, ACPI table
    grows at the same pace.

    Every thing looks good, until one of it exceed the resizable
    MemoryRegion's max size. (Not sure this is possible in reality, while
    possible in theory). Actually, this looks like case 1) when resizable
    MemoryRegion is not introduced. The too big ACPI table get corrupted.

So if my understanding is correct, the procedure you mentioned "expand from
initial padded size to next padded size" only applies to two different max
size resizable MemoryRegion. For other cases, the procedure corrupt the ACPI
table itself.

Then when we look at

    commit 07fb61760cdea7c3f1b9c897513986945bca8e89
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Mon Jul 28 17:34:15 2014 +0200

        pc: hack for migration compatibility from QEMU 2.0

This fix ACPI migration issue before resizable MemoryRegion is
introduced(introduced in 2015-01-08). This looks expand to next padded size
always corrupt ACPI table at that time. And it make me think expand to next
padded size is not the procedure we should do?

And my colleague Wei Wang(in cc) mentioned, to make migration succeed, the
MemoryRegion has to be the same size at both side. So I guess the problem
doesn't lie in hotplug but in "main table" size difference.

For example, we have two version of Qemu: v1 and v2. Their "main table" size
is:

    v1: 3990
    v2: 4020

At this point, their ACPI table all padded to 4k, which is the same.

Then we create a machine with 1 more vcpu by these two versions. This will
expand the table to:

    v1: 4095
    v2: 4125

After padding, v1's ACPI table size is still 4k but v2's is 8k. Now the
migration is broken.

If this analysis is correct, the relationship between migration failure and
ACPI table is "the change of ACPI table size". Any size change of any
ACPI table would break migration. While of course, since we pad the table,
only some combinations of tables would result in a visible real size change in
MemoryRegion.

Then the principle for future ACPI development is to keep all ACPI table size
unchanged.

Now let's back to mcfg table. As the comment mentioned, guest could
enable/disable MCFG, so the code here reserve table no matter it is enabled or
not. This behavior ensures ACPI table size not changed. So do we need to find
the machine type as you suggested before?

Igor Mammedov April 9, 2019, 2:54 p.m. UTC | #12

On Fri, 5 Apr 2019 16:55:30 +0800
Wei Yang <richardw.yang@linux.intel.com> wrote:

> On Tue, Apr 02, 2019 at 08:15:12AM +0200, Igor Mammedov wrote:
> >On Tue, 2 Apr 2019 11:53:43 +0800
> >Wei Yang <richardw.yang@linux.intel.com> wrote:
> >
> >   
> >> The migration infrastructure has several SaveStateEntry to help migrate
> >> different elements. The one with name "ram" take charge of RAMBlock. So this
> >> SaveStateEntry and its ops is the next step for me to investigate. And from
> >> this to see the effect of different size MemoryRegion during migration.  
> >I don't think that you need to dig in migration mechanics so deep.
> >For our purpose of finding QEMU&machine version where migration between
> >size mismatched MemoryRegions (or RAMBlocks) is fixed is sufficient.
> >  
> 
> ok.
> 
> >Aside from trying to grasp how migration works internally, you can try
> >to simulate[1] problem and bisect it to a commit that fixed it.
> >It still not easy due to amount of combinations you'd need to try,
> >but it's probably much easier than trying to figure out issue just
> >by reading code.
> >
> >1) to stimulate you need to recreate conditions for table_mr jumping
> >from initial padded size to the next padded size after adding bridge
> >so you'd have reproducer which makes table_mr differ in size.
> >
> >If I recall correctly in that time conditions were created by large
> >amount hotpluggble CPUs (maxcpus - CLI option) and then addidng
> >PCI bridges.
> >I'd just hack QEMU to print table_mr size in acpi_build_update()
> >to find coldplug CLI where we jump to the next padded size and then
> >use it for bisection with bridge[s] hotplug on source and migrating that
> >without reboot to target where hotplugged bridges are on CLI (one has to
> >configure target so it would have all devices that source has including
> >hotplugged ones)
> >  
> 
> Igor,
> 
> I got some confusion on how to re-produce this case. To be specific, how to 
> expand table_mr from initial padded size to next padded size.

here is reproducer:

$ qemu-system-x86_64-2.2 -M pc-i440fx-2.2 -smp 1,maxcpus=255 -monitor stdio `for i in {1..28}; do echo  "--device pci-bridge,chassis_nr=$i"; done` `for i in {1..24}; do printf --  " --device pci-bridge,bus=pci.1,addr=%x.0,chassis_nr=$i" $i ; done

## hotplug a bridge an migrate to file:

(qemu) device_add pci-bridge,bus=pci.1,addr=19.0,chassis_nr=25
(qemu) migrate "exec:gzip -c > STATEFILE.gz"

## exit and start destination

qemu-system-x86_64  -M pc-i440fx-2.2 -smp 1,maxcpus=255 -monitor stdio `for i in {1..28}; do echo  "--device pci-bridge,chassis_nr=$i"; done` `for i in {1..25}; do printf --  " --device pci-bridge,bus=pci.1,addr=%x.0,chassis_nr=$i" $i ; done`  -incoming "exec: gzip -c -d STATEFILE.gz"

# in case destination is QEMU-2.2 it will fail with:
  Length mismatch: /rom@etc/acpi/tables: 0x20000 in != 0x40000

# in case destination is QEMU-2.3 it will work as expected


> 
> Let's see a normal hotplug case first. 
> 
>     I did one test to see the how a guest with hot-plug cpu migrate to
>     destination.  It looks the migration fails if I just do hot-plug at
>     source. So I have to do hot-plug both at source and at destination. This
>     will expand the table_mr both at source and destination.
> 
> Then let's see the effect of hotplug more devices to exceed original padded
> size. There are two cases, before re-sizable MemoryRegion and after.
> 
> 1) Before re-sizable MemoryRegion introduced
> 
>     Before re-sizable MemoryRegion introduced, we just pad table_mr to 4K. And
>     this size never gets bigger, if I am right. To be accurate, the table_blob
>     would grow to next padded size if we hot-add more cpus/pci bridge, but we
>     just copy the original size of MemoryRegion. Even without migration, the
>     ACPI table is corrupted when we expand to next padded size.
>     
>     Is my understanding correct here?
> 
> 2) After re-sizable MemoryRegion introduced
> 
>     This time both tabl_blob and MemoryRegion grows when it expand to next
>     padded size. Since we need to hot-add device at both side, ACPI table
>     grows at the same pace.
> 
>     Every thing looks good, until one of it exceed the resizable
>     MemoryRegion's max size. (Not sure this is possible in reality, while
>     possible in theory). Actually, this looks like case 1) when resizable
>     MemoryRegion is not introduced. The too big ACPI table get corrupted.
> 
> So if my understanding is correct, the procedure you mentioned "expand from
> initial padded size to next padded size" only applies to two different max
> size resizable MemoryRegion. For other cases, the procedure corrupt the ACPI
> table itself.
> 
> Then when we look at
> 
>     commit 07fb61760cdea7c3f1b9c897513986945bca8e89
>     Author: Paolo Bonzini <pbonzini@redhat.com>
>     Date:   Mon Jul 28 17:34:15 2014 +0200
>     
>         pc: hack for migration compatibility from QEMU 2.0
>     
> This fix ACPI migration issue before resizable MemoryRegion is
> introduced(introduced in 2015-01-08). This looks expand to next padded size
> always corrupt ACPI table at that time. And it make me think expand to next
> padded size is not the procedure we should do?
> 
> And my colleague Wei Wang(in cc) mentioned, to make migration succeed, the
> MemoryRegion has to be the same size at both side. So I guess the problem
> doesn't lie in hotplug but in "main table" size difference.

It's true only for pre-resizable MemoryRegion QEMU versions,
after that size doesn't affect migration anymore.


> For example, we have two version of Qemu: v1 and v2. Their "main table" size
> is:
> 
>     v1: 3990
>     v2: 4020
> 
> At this point, their ACPI table all padded to 4k, which is the same.
> 
> Then we create a machine with 1 more vcpu by these two versions. This will
> expand the table to:
> 
>     v1: 4095
>     v2: 4125
> 
> After padding, v1's ACPI table size is still 4k but v2's is 8k. Now the
> migration is broken.
> 
> If this analysis is correct, the relationship between migration failure and
> ACPI table is "the change of ACPI table size". Any size change of any
you should make distinction between used_length and max_length here.
Migration puts on wire used_length and that's what matter for keeping migration
working.

> ACPI table would break migration. While of course, since we pad the table,
> only some combinations of tables would result in a visible real size change in
> MemoryRegion.
> 
> Then the principle for future ACPI development is to keep all ACPI table size
> unchanged.
once again it applies only for QEMU (versions < 2.1) and that was
the problem (i.e. there always would be configurations that would create
differently sized tables regardless of arbitrary size we would preallocate)
resizable MemoryRegions solved.
 
> Now let's back to mcfg table. As the comment mentioned, guest could
> enable/disable MCFG, so the code here reserve table no matter it is enabled or
> not. This behavior ensures ACPI table size not changed. So do we need to find
> the machine type as you suggested before?
We should be able to drop mcgf 'padding' hack since machine version
which was introduced in the QEMU version that introduced resizable MemoryRegion
as well.

I'll send a patch to address that

Wei Yang April 12, 2019, 5:44 a.m. UTC | #13

On Tue, Apr 09, 2019 at 04:54:15PM +0200, Igor Mammedov wrote:
>> 
>> Let's see a normal hotplug case first. 
>> 
>>     I did one test to see the how a guest with hot-plug cpu migrate to
>>     destination.  It looks the migration fails if I just do hot-plug at
>>     source. So I have to do hot-plug both at source and at destination. This
>>     will expand the table_mr both at source and destination.
>> 
>> Then let's see the effect of hotplug more devices to exceed original padded
>> size. There are two cases, before re-sizable MemoryRegion and after.
>> 
>> 1) Before re-sizable MemoryRegion introduced
>> 
>>     Before re-sizable MemoryRegion introduced, we just pad table_mr to 4K. And
>>     this size never gets bigger, if I am right. To be accurate, the table_blob
>>     would grow to next padded size if we hot-add more cpus/pci bridge, but we
>>     just copy the original size of MemoryRegion. Even without migration, the
>>     ACPI table is corrupted when we expand to next padded size.
>>     
>>     Is my understanding correct here?
>> 
>> 2) After re-sizable MemoryRegion introduced
>> 
>>     This time both tabl_blob and MemoryRegion grows when it expand to next
>>     padded size. Since we need to hot-add device at both side, ACPI table
>>     grows at the same pace.
>> 
>>     Every thing looks good, until one of it exceed the resizable
>>     MemoryRegion's max size. (Not sure this is possible in reality, while
>>     possible in theory). Actually, this looks like case 1) when resizable
>>     MemoryRegion is not introduced. The too big ACPI table get corrupted.
>> 
>> So if my understanding is correct, the procedure you mentioned "expand from
>> initial padded size to next padded size" only applies to two different max
>> size resizable MemoryRegion. For other cases, the procedure corrupt the ACPI
>> table itself.
>> 
>> Then when we look at
>> 
>>     commit 07fb61760cdea7c3f1b9c897513986945bca8e89
>>     Author: Paolo Bonzini <pbonzini@redhat.com>
>>     Date:   Mon Jul 28 17:34:15 2014 +0200
>>     
>>         pc: hack for migration compatibility from QEMU 2.0
>>     
>> This fix ACPI migration issue before resizable MemoryRegion is
>> introduced(introduced in 2015-01-08). This looks expand to next padded size
>> always corrupt ACPI table at that time. And it make me think expand to next
>> padded size is not the procedure we should do?
>> 
>> And my colleague Wei Wang(in cc) mentioned, to make migration succeed, the
>> MemoryRegion has to be the same size at both side. So I guess the problem
>> doesn't lie in hotplug but in "main table" size difference.
>
>It's true only for pre-resizable MemoryRegion QEMU versions,
>after that size doesn't affect migration anymore.
>
>
>> For example, we have two version of Qemu: v1 and v2. Their "main table" size
>> is:
>> 
>>     v1: 3990
>>     v2: 4020
>> 
>> At this point, their ACPI table all padded to 4k, which is the same.
>> 
>> Then we create a machine with 1 more vcpu by these two versions. This will
>> expand the table to:
>> 
>>     v1: 4095
>>     v2: 4125
>> 
>> After padding, v1's ACPI table size is still 4k but v2's is 8k. Now the
>> migration is broken.
>> 
>> If this analysis is correct, the relationship between migration failure and
>> ACPI table is "the change of ACPI table size". Any size change of any
>you should make distinction between used_length and max_length here.
>Migration puts on wire used_length and that's what matter for keeping migration
>working.
>
>> ACPI table would break migration. While of course, since we pad the table,
>> only some combinations of tables would result in a visible real size change in
>> MemoryRegion.
>> 
>> Then the principle for future ACPI development is to keep all ACPI table size
>> unchanged.
>once again it applies only for QEMU (versions < 2.1) and that was
>the problem (i.e. there always would be configurations that would create
>differently sized tables regardless of arbitrary size we would preallocate)
>resizable MemoryRegions solved.
> 
>> Now let's back to mcfg table. As the comment mentioned, guest could
>> enable/disable MCFG, so the code here reserve table no matter it is enabled or
>> not. This behavior ensures ACPI table size not changed. So do we need to find
>> the machine type as you suggested before?
>We should be able to drop mcgf 'padding' hack since machine version
>which was introduced in the QEMU version that introduced resizable MemoryRegion
>as well.
>
>I'll send a patch to address that

Hi, Igor,

We have found the qemu version 2.1 which is with resizable MemoryRegion
enabled and q35 will stop support version before 2.3. The concern about ACPI
mcfg table breaking live migration is solved, right?

If so, I would prepare mcfg refactor patch based on your cleanup.

Igor Mammedov April 12, 2019, 8:14 a.m. UTC | #14

On Fri, 12 Apr 2019 13:44:07 +0800
Wei Yang <richardw.yang@linux.intel.com> wrote:

> On Tue, Apr 09, 2019 at 04:54:15PM +0200, Igor Mammedov wrote:
> >> 
> >> Let's see a normal hotplug case first. 
> >> 
> >>     I did one test to see the how a guest with hot-plug cpu migrate to
> >>     destination.  It looks the migration fails if I just do hot-plug at
> >>     source. So I have to do hot-plug both at source and at destination. This
> >>     will expand the table_mr both at source and destination.
> >> 
> >> Then let's see the effect of hotplug more devices to exceed original padded
> >> size. There are two cases, before re-sizable MemoryRegion and after.
> >> 
> >> 1) Before re-sizable MemoryRegion introduced
> >> 
> >>     Before re-sizable MemoryRegion introduced, we just pad table_mr to 4K. And
> >>     this size never gets bigger, if I am right. To be accurate, the table_blob
> >>     would grow to next padded size if we hot-add more cpus/pci bridge, but we
> >>     just copy the original size of MemoryRegion. Even without migration, the
> >>     ACPI table is corrupted when we expand to next padded size.
> >>     
> >>     Is my understanding correct here?
> >> 
> >> 2) After re-sizable MemoryRegion introduced
> >> 
> >>     This time both tabl_blob and MemoryRegion grows when it expand to next
> >>     padded size. Since we need to hot-add device at both side, ACPI table
> >>     grows at the same pace.
> >> 
> >>     Every thing looks good, until one of it exceed the resizable
> >>     MemoryRegion's max size. (Not sure this is possible in reality, while
> >>     possible in theory). Actually, this looks like case 1) when resizable
> >>     MemoryRegion is not introduced. The too big ACPI table get corrupted.
> >> 
> >> So if my understanding is correct, the procedure you mentioned "expand from
> >> initial padded size to next padded size" only applies to two different max
> >> size resizable MemoryRegion. For other cases, the procedure corrupt the ACPI
> >> table itself.
> >> 
> >> Then when we look at
> >> 
> >>     commit 07fb61760cdea7c3f1b9c897513986945bca8e89
> >>     Author: Paolo Bonzini <pbonzini@redhat.com>
> >>     Date:   Mon Jul 28 17:34:15 2014 +0200
> >>     
> >>         pc: hack for migration compatibility from QEMU 2.0
> >>     
> >> This fix ACPI migration issue before resizable MemoryRegion is
> >> introduced(introduced in 2015-01-08). This looks expand to next padded size
> >> always corrupt ACPI table at that time. And it make me think expand to next
> >> padded size is not the procedure we should do?
> >> 
> >> And my colleague Wei Wang(in cc) mentioned, to make migration succeed, the
> >> MemoryRegion has to be the same size at both side. So I guess the problem
> >> doesn't lie in hotplug but in "main table" size difference.  
> >
> >It's true only for pre-resizable MemoryRegion QEMU versions,
> >after that size doesn't affect migration anymore.
> >
> >  
> >> For example, we have two version of Qemu: v1 and v2. Their "main table" size
> >> is:
> >> 
> >>     v1: 3990
> >>     v2: 4020
> >> 
> >> At this point, their ACPI table all padded to 4k, which is the same.
> >> 
> >> Then we create a machine with 1 more vcpu by these two versions. This will
> >> expand the table to:
> >> 
> >>     v1: 4095
> >>     v2: 4125
> >> 
> >> After padding, v1's ACPI table size is still 4k but v2's is 8k. Now the
> >> migration is broken.
> >> 
> >> If this analysis is correct, the relationship between migration failure and
> >> ACPI table is "the change of ACPI table size". Any size change of any  
> >you should make distinction between used_length and max_length here.
> >Migration puts on wire used_length and that's what matter for keeping migration
> >working.
> >  
> >> ACPI table would break migration. While of course, since we pad the table,
> >> only some combinations of tables would result in a visible real size change in
> >> MemoryRegion.
> >> 
> >> Then the principle for future ACPI development is to keep all ACPI table size
> >> unchanged.  
> >once again it applies only for QEMU (versions < 2.1) and that was
> >the problem (i.e. there always would be configurations that would create
> >differently sized tables regardless of arbitrary size we would preallocate)
> >resizable MemoryRegions solved.
> >   
> >> Now let's back to mcfg table. As the comment mentioned, guest could
> >> enable/disable MCFG, so the code here reserve table no matter it is enabled or
> >> not. This behavior ensures ACPI table size not changed. So do we need to find
> >> the machine type as you suggested before?  
> >We should be able to drop mcgf 'padding' hack since machine version
> >which was introduced in the QEMU version that introduced resizable MemoryRegion
> >as well.
> >
> >I'll send a patch to address that  
> 
> Hi, Igor,
> 
> We have found the qemu version 2.1 which is with resizable MemoryRegion
> enabled and q35 will stop support version before 2.3. The concern about ACPI
> mcfg table breaking live migration is solved, right?
> 
> If so, I would prepare mcfg refactor patch based on your cleanup.
yes, just base your patches on top of
 "[PATCH for-4.1] q35: acpi: do not create dummy MCFG  table"

[RFC,3/3] hw/acpi: Extract build_mcfg

Commit Message

Comments

Patch