diff mbox series

[v4] hvmloader: indicate ACPI tables with "ACPI data" type in e820

Message ID 1599579679-23983-1-git-send-email-igor.druzhinin@citrix.com
State New
Headers show
Series [v4] hvmloader: indicate ACPI tables with "ACPI data" type in e820 | expand

Commit Message

Igor Druzhinin Sept. 8, 2020, 3:41 p.m. UTC
Guest kernel does need to know in some cases where the tables are located
to treat these regions properly. One example is kexec process where
the first kernel needs to pass ACPI region locations to the second
kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
Parse SRAT table and count immovable memory regions") in order for kexec
transition to actually work.

That commit introduced accesses to XSDT and SRAT while the second kernel
is still using kexec transition tables. The transition tables do not have
e820 "reserved" regions mapped where those tables are located currently
in a Xen guest. Instead "ACPI data" regions are mapped with the transition
tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
Add the EFI system tables and ACPI tables to the ident map").

Reserve 1MB (out of 16MB currently available) right after ACPI info page for
ACPI tables exclusively but populate this region on demand and only indicate
populated memory as "ACPI data" since according to ACPI spec that memory is
reclaimable by the guest if necessary. That is close to how we treat
the same ACPI data in PVH guests. 1MB should be enough for now but could be
later extended if required.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
Changes in v4:
- gated new region creation on acpi_enabled
- added a comment to explain reserved region start point

Changes in v3:
- switched from NVS to regular "ACPI data" type by separating ACPI allocations
  into their own region
- gave more information on particular kexec usecase that requires this change

Changes in v2:
- gave more information on NVS type selection and potential alternatives
  in the description
- minor type fixes suggested
---
 tools/firmware/hvmloader/config.h    |  6 +++++-
 tools/firmware/hvmloader/e820.c      | 26 ++++++++++++++++++++++----
 tools/firmware/hvmloader/hvmloader.c |  3 ++-
 tools/firmware/hvmloader/pci.c       |  1 -
 tools/firmware/hvmloader/util.c      | 29 ++++++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.h      |  2 ++
 6 files changed, 59 insertions(+), 8 deletions(-)

Comments

Jan Beulich Sept. 8, 2020, 3:48 p.m. UTC | #1
On 08.09.2020 17:41, Igor Druzhinin wrote:
> Guest kernel does need to know in some cases where the tables are located
> to treat these regions properly. One example is kexec process where
> the first kernel needs to pass ACPI region locations to the second
> kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
> Parse SRAT table and count immovable memory regions") in order for kexec
> transition to actually work.
> 
> That commit introduced accesses to XSDT and SRAT while the second kernel
> is still using kexec transition tables. The transition tables do not have
> e820 "reserved" regions mapped where those tables are located currently
> in a Xen guest. Instead "ACPI data" regions are mapped with the transition
> tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
> Add the EFI system tables and ACPI tables to the ident map").
> 
> Reserve 1MB (out of 16MB currently available) right after ACPI info page for
> ACPI tables exclusively but populate this region on demand and only indicate
> populated memory as "ACPI data" since according to ACPI spec that memory is
> reclaimable by the guest if necessary. That is close to how we treat
> the same ACPI data in PVH guests. 1MB should be enough for now but could be
> later extended if required.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich Sept. 9, 2020, 4 p.m. UTC | #2
On 08.09.2020 17:41, Igor Druzhinin wrote:
> Guest kernel does need to know in some cases where the tables are located
> to treat these regions properly. One example is kexec process where
> the first kernel needs to pass ACPI region locations to the second
> kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
> Parse SRAT table and count immovable memory regions") in order for kexec
> transition to actually work.
> 
> That commit introduced accesses to XSDT and SRAT while the second kernel
> is still using kexec transition tables. The transition tables do not have
> e820 "reserved" regions mapped where those tables are located currently
> in a Xen guest. Instead "ACPI data" regions are mapped with the transition
> tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
> Add the EFI system tables and ACPI tables to the ident map").
> 
> Reserve 1MB (out of 16MB currently available) right after ACPI info page for
> ACPI tables exclusively but populate this region on demand and only indicate
> populated memory as "ACPI data" since according to ACPI spec that memory is
> reclaimable by the guest if necessary. That is close to how we treat
> the same ACPI data in PVH guests. 1MB should be enough for now but could be
> later extended if required.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

After committing this I'm now somewhat uncertain whether to queue this
for the stable trees. Does either of you (or anyone else) have any clear
opinion either way?

Jan
Igor Druzhinin Sept. 9, 2020, 4:15 p.m. UTC | #3
On 09/09/2020 17:00, Jan Beulich wrote:
> On 08.09.2020 17:41, Igor Druzhinin wrote:
>> Guest kernel does need to know in some cases where the tables are located
>> to treat these regions properly. One example is kexec process where
>> the first kernel needs to pass ACPI region locations to the second
>> kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
>> Parse SRAT table and count immovable memory regions") in order for kexec
>> transition to actually work.
>>
>> That commit introduced accesses to XSDT and SRAT while the second kernel
>> is still using kexec transition tables. The transition tables do not have
>> e820 "reserved" regions mapped where those tables are located currently
>> in a Xen guest. Instead "ACPI data" regions are mapped with the transition
>> tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
>> Add the EFI system tables and ACPI tables to the ident map").
>>
>> Reserve 1MB (out of 16MB currently available) right after ACPI info page for
>> ACPI tables exclusively but populate this region on demand and only indicate
>> populated memory as "ACPI data" since according to ACPI spec that memory is
>> reclaimable by the guest if necessary. That is close to how we treat
>> the same ACPI data in PVH guests. 1MB should be enough for now but could be
>> later extended if required.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> After committing this I'm now somewhat uncertain whether to queue this
> for the stable trees. Does either of you (or anyone else) have any clear
> opinion either way?

That depends on what the upstream support statement was for kexec/kdump for stable
releases.Note that newer guests (e.g. Ubuntu 20.04) won't able to enter kdump
kernel without that.

In Citrix we'd be glad if it reaches at least stable-4.13 as we're backporting this
functionality to our 4.13 based LTSR.

Igor
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index d9b4713..844120b 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -2,6 +2,7 @@ 
 #define __HVMLOADER_CONFIG_H__
 
 #include <stdint.h>
+#include <stdbool.h>
 
 enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
 extern enum virtual_vga virtual_vga;
@@ -62,6 +63,8 @@  extern uint8_t ioapic_version;
 extern unsigned long pci_mem_start, pci_mem_end;
 extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
 
+extern bool acpi_enabled;
+
 /* Memory map. */
 #define SCRATCH_PHYSICAL_ADDRESS      0x00010000
 #define HYPERCALL_PHYSICAL_ADDRESS    0x00080000
@@ -71,7 +74,8 @@  extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
 #define RESERVED_MEMBASE              0xFC000000
 /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
 #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
-#define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
+#define ACPI_MEMORY_DYNAMIC_START     0xFC001000
+#define RESERVED_MEMORY_DYNAMIC_START 0xFC100000
 #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
 /*
  * GUEST_RESERVED: Physical address space reserved for guest use.
diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 4d1c955..38bcf18 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -155,6 +155,9 @@  int build_e820_table(struct e820entry *e820,
 {
     unsigned int nr = 0, i, j;
     uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    unsigned long acpi_mem_end = acpi_enabled ?
+        ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT) :
+        RESERVED_MEMBASE;
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -199,8 +202,23 @@  int build_e820_table(struct e820entry *e820,
     nr++;
 
     /*
+     * Mark populated reserved memory that contains ACPI tables as ACPI data.
+     * That should help the guest to treat it correctly later: e.g. pass to
+     * the next kernel on kexec or reclaim if necessary.
+     */
+
+    if ( acpi_enabled )
+    {
+        e820[nr].addr = RESERVED_MEMBASE;
+        e820[nr].size = acpi_mem_end - RESERVED_MEMBASE;
+        e820[nr].type = E820_ACPI;
+        nr++;
+    }
+
+    /*
      * Explicitly reserve space for special pages.
-     * This space starts at RESERVED_MEMBASE an extends to cover various
+     * This space starts right after ACPI region (to avoid creating a hole that
+     * might be accidentally occupied by MMIO) and extends to cover various
      * fixed hardware mappings (e.g., LAPIC, IOAPIC, default SVGA framebuffer).
      *
      * If igd_opregion_pgbase we need to split the RESERVED region in two.
@@ -210,8 +228,8 @@  int build_e820_table(struct e820entry *e820,
     {
         uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
 
-        e820[nr].addr = RESERVED_MEMBASE;
-        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
+        e820[nr].addr = acpi_mem_end;
+        e820[nr].size = igd_opregion_base - acpi_mem_end;
         e820[nr].type = E820_RESERVED;
         nr++;
 
@@ -227,7 +245,7 @@  int build_e820_table(struct e820entry *e820,
     }
     else
     {
-        e820[nr].addr = RESERVED_MEMBASE;
+        e820[nr].addr = acpi_mem_end;
         e820[nr].size = (uint32_t)-e820[nr].addr;
         e820[nr].type = E820_RESERVED;
         nr++;
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 598a226..c58841e 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -116,6 +116,8 @@  unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS;
 uint32_t ioapic_base_address = 0xfec00000;
 uint8_t ioapic_version;
 
+bool acpi_enabled;
+
 static void init_hypercalls(void)
 {
     uint32_t eax, ebx, ecx, edx;
@@ -321,7 +323,6 @@  const struct hvm_modlist_entry *get_module_entry(
 int main(void)
 {
     const struct bios_config *bios;
-    int acpi_enabled;
     const struct hvm_modlist_entry *bios_module;
 
     /* Initialise hypercall stubs with RET, rendering them no-ops. */
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index dcd097a..72f92d4 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -28,7 +28,6 @@ 
 #include <xen/hvm/ioreq.h>
 #include <xen/hvm/hvm_xs_strings.h>
 #include <xen/hvm/e820.h>
-#include <stdbool.h>
 
 unsigned long pci_mem_start = HVM_BELOW_4G_MMIO_START;
 unsigned long pci_mem_end = PCI_MEM_END;
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d2..7da144b 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -871,10 +871,37 @@  static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v)
     return virt_to_phys(v);
 }
 
+static unsigned long acpi_alloc_up = ACPI_MEMORY_DYNAMIC_START - 1;
+
+unsigned long acpi_pages_allocated(void)
+{
+    return (acpi_alloc_up >> PAGE_SHIFT) -
+            ((ACPI_MEMORY_DYNAMIC_START - 1) >> PAGE_SHIFT);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
                             uint32_t size, uint32_t align)
 {
-    return mem_alloc(size, align);
+    unsigned long s, e;
+
+    /* Align to at least 16 bytes. */
+    if ( align < 16 )
+        align = 16;
+
+    s = (acpi_alloc_up + align) & ~(align - 1);
+    e = s + size - 1;
+
+    BUG_ON((e < s) || (e >= RESERVED_MEMORY_DYNAMIC_START));
+
+    while ( (acpi_alloc_up >> PAGE_SHIFT) != (e >> PAGE_SHIFT) )
+    {
+        acpi_alloc_up += PAGE_SIZE;
+        mem_hole_populate_ram(acpi_alloc_up >> PAGE_SHIFT, 1);
+    }
+
+    acpi_alloc_up = e;
+
+    return (void *)s;
 }
 
 static void acpi_mem_free(struct acpi_ctxt *ctxt,
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca641..31889de 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -282,6 +282,8 @@  bool check_overlap(uint64_t start, uint64_t size,
 extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
 extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
 
+unsigned long acpi_pages_allocated(void);
+
 struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
                                  unsigned int physical);