diff mbox series

[v3] hvmloader: indicate ACPI tables with "ACPI data" type in e820

Message ID 1599522163-21992-1-git-send-email-igor.druzhinin@citrix.com
State Superseded
Headers show
Series [v3] hvmloader: indicate ACPI tables with "ACPI data" type in e820 | expand

Commit Message

Igor Druzhinin Sept. 7, 2020, 11:42 p.m. UTC
Guest kernel does need to know in some cases where the tables are located
to treat these regions properly. One example is kexec process where
the first kernel needs to pass ACPI region locations to the second
kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
Parse SRAT table and count immovable memory regions") in order for kexec
transition to actually work.

That commit introduced accesses to XSDT and SRAT while the second kernel
is still using kexec transition tables. The transition tables do not have
e820 "reserved" regions mapped where those tables are located currently
in a Xen guest. Instead "ACPI data" regions are mapped with the transition
tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
Add the EFI system tables and ACPI tables to the ident map").

Reserve 1MB (out of 16MB currently available) right after ACPI info page for
ACPI tables exclusively but populate this region on demand and only indicate
populated memory as "ACPI data" since according to ACPI spec that memory is
reclaimable by the guest if necessary. That is close to how we treat
the same ACPI data in PVH guests. 1MB should be enough for now but could be
later extended if required.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
Changes in v3:
- switched from NVS to regular "ACPI data" type by separating ACPI allocations
  into their own region
- gave more information on particular kexec usecase that requires this change

Changes in v2:
- gave more information on NVS type selection and potential alternatives
  in the description
- minor type fixes suggested
---
 tools/firmware/hvmloader/config.h |  3 ++-
 tools/firmware/hvmloader/e820.c   | 21 +++++++++++++++++----
 tools/firmware/hvmloader/util.c   | 29 ++++++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.h   |  2 ++
 4 files changed, 49 insertions(+), 6 deletions(-)

Comments

Jan Beulich Sept. 8, 2020, 9:15 a.m. UTC | #1
On 08.09.2020 01:42, Igor Druzhinin wrote:
> Changes in v3:
> - switched from NVS to regular "ACPI data" type by separating ACPI allocations
>   into their own region
> - gave more information on particular kexec usecase that requires this change

Thanks a lot for doing both of these.

> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -155,6 +155,8 @@ int build_e820_table(struct e820entry *e820,
>  {
>      unsigned int nr = 0, i, j;
>      uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
> +    unsigned long acpi_mem_end =
> +        ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT);
>  
>      if ( !lowmem_reserved_base )
>              lowmem_reserved_base = 0xA0000;
> @@ -199,8 +201,19 @@ int build_e820_table(struct e820entry *e820,
>      nr++;
>  
>      /*
> +     * Mark populated reserved memory that contains ACPI tables as ACPI data.
> +     * That should help the guest to treat it correctly later: e.g. pass to
> +     * the next kernel on kexec or reclaim if necessary.
> +     */
> +
> +    e820[nr].addr = RESERVED_MEMBASE;
> +    e820[nr].size = acpi_mem_end - RESERVED_MEMBASE;
> +    e820[nr].type = E820_ACPI;
> +    nr++;

This region may be empty (afaict), when !acpi_enabled. Imo we'd better
avoid adding empty ranges.

> @@ -210,8 +223,8 @@ int build_e820_table(struct e820entry *e820,
>      {
>          uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
>  
> -        e820[nr].addr = RESERVED_MEMBASE;
> -        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
> +        e820[nr].addr = acpi_mem_end;
> +        e820[nr].size = igd_opregion_base - acpi_mem_end;
>          e820[nr].type = E820_RESERVED;
>          nr++;
>  
> @@ -227,7 +240,7 @@ int build_e820_table(struct e820entry *e820,
>      }
>      else
>      {
> -        e820[nr].addr = RESERVED_MEMBASE;
> +        e820[nr].addr = acpi_mem_end;
>          e820[nr].size = (uint32_t)-e820[nr].addr;
>          e820[nr].type = E820_RESERVED;
>          nr++;

In both cases - why not RESERVED_MEMORY_DYNAMIC_START? I.e. why
mark reserved space that isn't in use for anything?

Jan
Igor Druzhinin Sept. 8, 2020, 10:44 a.m. UTC | #2
On 08/09/2020 10:15, Jan Beulich wrote:
> On 08.09.2020 01:42, Igor Druzhinin wrote:
>> Changes in v3:
>> - switched from NVS to regular "ACPI data" type by separating ACPI allocations
>>   into their own region
>> - gave more information on particular kexec usecase that requires this change
> 
> Thanks a lot for doing both of these.
> 
>> --- a/tools/firmware/hvmloader/e820.c
>> +++ b/tools/firmware/hvmloader/e820.c
>> @@ -155,6 +155,8 @@ int build_e820_table(struct e820entry *e820,
>>  {
>>      unsigned int nr = 0, i, j;
>>      uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
>> +    unsigned long acpi_mem_end =
>> +        ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT);
>>  
>>      if ( !lowmem_reserved_base )
>>              lowmem_reserved_base = 0xA0000;
>> @@ -199,8 +201,19 @@ int build_e820_table(struct e820entry *e820,
>>      nr++;
>>  
>>      /*
>> +     * Mark populated reserved memory that contains ACPI tables as ACPI data.
>> +     * That should help the guest to treat it correctly later: e.g. pass to
>> +     * the next kernel on kexec or reclaim if necessary.
>> +     */
>> +
>> +    e820[nr].addr = RESERVED_MEMBASE;
>> +    e820[nr].size = acpi_mem_end - RESERVED_MEMBASE;
>> +    e820[nr].type = E820_ACPI;
>> +    nr++;
> 
> This region may be empty (afaict), when !acpi_enabled. Imo we'd better
> avoid adding empty ranges.

Thanks, will gate creation of this region on acpi_enabled.

>> @@ -210,8 +223,8 @@ int build_e820_table(struct e820entry *e820,
>>      {
>>          uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
>>  
>> -        e820[nr].addr = RESERVED_MEMBASE;
>> -        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
>> +        e820[nr].addr = acpi_mem_end;
>> +        e820[nr].size = igd_opregion_base - acpi_mem_end;
>>          e820[nr].type = E820_RESERVED;
>>          nr++;
>>  
>> @@ -227,7 +240,7 @@ int build_e820_table(struct e820entry *e820,
>>      }
>>      else
>>      {
>> -        e820[nr].addr = RESERVED_MEMBASE;
>> +        e820[nr].addr = acpi_mem_end;
>>          e820[nr].size = (uint32_t)-e820[nr].addr;
>>          e820[nr].type = E820_RESERVED;
>>          nr++;
> 
> In both cases - why not RESERVED_MEMORY_DYNAMIC_START? I.e. why
> mark reserved space that isn't in use for anything?

I think it's better to reserve space that a) isn't suppose to be in use for anything - 
we don't really want some MMIO being accidentally mapped there and confusing whatever in
our fragile model, b) that wasn't a hole before so it'd be dangerous to make it that
way here. Overall, I think it's better to keep this space as reserved as possible as
before.

Igor
Jan Beulich Sept. 8, 2020, 10:51 a.m. UTC | #3
On 08.09.2020 12:44, Igor Druzhinin wrote:
> On 08/09/2020 10:15, Jan Beulich wrote:
>> On 08.09.2020 01:42, Igor Druzhinin wrote:
>>> @@ -210,8 +223,8 @@ int build_e820_table(struct e820entry *e820,
>>>      {
>>>          uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
>>>  
>>> -        e820[nr].addr = RESERVED_MEMBASE;
>>> -        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
>>> +        e820[nr].addr = acpi_mem_end;
>>> +        e820[nr].size = igd_opregion_base - acpi_mem_end;
>>>          e820[nr].type = E820_RESERVED;
>>>          nr++;
>>>  
>>> @@ -227,7 +240,7 @@ int build_e820_table(struct e820entry *e820,
>>>      }
>>>      else
>>>      {
>>> -        e820[nr].addr = RESERVED_MEMBASE;
>>> +        e820[nr].addr = acpi_mem_end;
>>>          e820[nr].size = (uint32_t)-e820[nr].addr;
>>>          e820[nr].type = E820_RESERVED;
>>>          nr++;
>>
>> In both cases - why not RESERVED_MEMORY_DYNAMIC_START? I.e. why
>> mark reserved space that isn't in use for anything?
> 
> I think it's better to reserve space that a) isn't suppose to be in use for anything - 
> we don't really want some MMIO being accidentally mapped there and confusing whatever in
> our fragile model, b) that wasn't a hole before so it'd be dangerous to make it that
> way here. Overall, I think it's better to keep this space as reserved as possible as
> before.

Hmm, yes, fair point. Could you please briefly comment on this in
the code, perhaps by extending an existing comment by half a
sentence?

Jan
diff mbox series

Patch

diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index d9b4713..ff19b64 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -71,7 +71,8 @@  extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
 #define RESERVED_MEMBASE              0xFC000000
 /* NB. ACPI_INFO_PHYSICAL_ADDRESS *MUST* match definition in acpi/dsdt.asl! */
 #define ACPI_INFO_PHYSICAL_ADDRESS    0xFC000000
-#define RESERVED_MEMORY_DYNAMIC_START 0xFC001000
+#define ACPI_MEMORY_DYNAMIC_START     0xFC001000
+#define RESERVED_MEMORY_DYNAMIC_START 0xFC100000
 #define RESERVED_MEMORY_DYNAMIC_END   0xFE000000
 /*
  * GUEST_RESERVED: Physical address space reserved for guest use.
diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 4d1c955..bc83808 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -155,6 +155,8 @@  int build_e820_table(struct e820entry *e820,
 {
     unsigned int nr = 0, i, j;
     uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
+    unsigned long acpi_mem_end =
+        ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT);
 
     if ( !lowmem_reserved_base )
             lowmem_reserved_base = 0xA0000;
@@ -199,8 +201,19 @@  int build_e820_table(struct e820entry *e820,
     nr++;
 
     /*
+     * Mark populated reserved memory that contains ACPI tables as ACPI data.
+     * That should help the guest to treat it correctly later: e.g. pass to
+     * the next kernel on kexec or reclaim if necessary.
+     */
+
+    e820[nr].addr = RESERVED_MEMBASE;
+    e820[nr].size = acpi_mem_end - RESERVED_MEMBASE;
+    e820[nr].type = E820_ACPI;
+    nr++;
+
+    /*
      * Explicitly reserve space for special pages.
-     * This space starts at RESERVED_MEMBASE an extends to cover various
+     * This space starts after ACPI region and extends to cover various
      * fixed hardware mappings (e.g., LAPIC, IOAPIC, default SVGA framebuffer).
      *
      * If igd_opregion_pgbase we need to split the RESERVED region in two.
@@ -210,8 +223,8 @@  int build_e820_table(struct e820entry *e820,
     {
         uint32_t igd_opregion_base = igd_opregion_pgbase << PAGE_SHIFT;
 
-        e820[nr].addr = RESERVED_MEMBASE;
-        e820[nr].size = (uint32_t) igd_opregion_base - RESERVED_MEMBASE;
+        e820[nr].addr = acpi_mem_end;
+        e820[nr].size = igd_opregion_base - acpi_mem_end;
         e820[nr].type = E820_RESERVED;
         nr++;
 
@@ -227,7 +240,7 @@  int build_e820_table(struct e820entry *e820,
     }
     else
     {
-        e820[nr].addr = RESERVED_MEMBASE;
+        e820[nr].addr = acpi_mem_end;
         e820[nr].size = (uint32_t)-e820[nr].addr;
         e820[nr].type = E820_RESERVED;
         nr++;
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 0c3f2d2..7da144b 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -871,10 +871,37 @@  static unsigned long acpi_v2p(struct acpi_ctxt *ctxt, void *v)
     return virt_to_phys(v);
 }
 
+static unsigned long acpi_alloc_up = ACPI_MEMORY_DYNAMIC_START - 1;
+
+unsigned long acpi_pages_allocated(void)
+{
+    return (acpi_alloc_up >> PAGE_SHIFT) -
+            ((ACPI_MEMORY_DYNAMIC_START - 1) >> PAGE_SHIFT);
+}
+
 static void *acpi_mem_alloc(struct acpi_ctxt *ctxt,
                             uint32_t size, uint32_t align)
 {
-    return mem_alloc(size, align);
+    unsigned long s, e;
+
+    /* Align to at least 16 bytes. */
+    if ( align < 16 )
+        align = 16;
+
+    s = (acpi_alloc_up + align) & ~(align - 1);
+    e = s + size - 1;
+
+    BUG_ON((e < s) || (e >= RESERVED_MEMORY_DYNAMIC_START));
+
+    while ( (acpi_alloc_up >> PAGE_SHIFT) != (e >> PAGE_SHIFT) )
+    {
+        acpi_alloc_up += PAGE_SIZE;
+        mem_hole_populate_ram(acpi_alloc_up >> PAGE_SHIFT, 1);
+    }
+
+    acpi_alloc_up = e;
+
+    return (void *)s;
 }
 
 static void acpi_mem_free(struct acpi_ctxt *ctxt,
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7bca641..31889de 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -282,6 +282,8 @@  bool check_overlap(uint64_t start, uint64_t size,
 extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
 extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
 
+unsigned long acpi_pages_allocated(void);
+
 struct acpi_config;
 void hvmloader_acpi_build_tables(struct acpi_config *config,
                                  unsigned int physical);