diff mbox

[v3,12/16,-,RFC] x86/efi: create new early memory allocator

Message ID 1460723596-13261-13-git-send-email-daniel.kiper@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Kiper April 15, 2016, 12:33 p.m. UTC
There is a problem with place_string() which is used as early memory
allocator. It gets memory chunks starting from start symbol and
going down. Sadly this does not work when Xen is loaded using multiboot2
protocol because start lives on 1 MiB address. So, I tried to use
mem_lower address calculated by GRUB2. However, it works only on some
machines. There are machines in the wild (e.g. Dell PowerEdge R820)
which uses first ~640 KiB for boot services code or data... :-(((

In case of multiboot2 protocol we need that place_string() only allocate
memory chunk for EFI memory map. However, I think that it should be fixed
instead of making another function used just in one case. I thought about
two solutions.

1) We could use native EFI allocation functions (e.g. AllocatePool()
   or AllocatePages()) to get memory chunk. However, later (somewhere
   in __start_xen()) we must copy its contents to safe place or reserve
   this in e820 memory map and map it in Xen virtual address space.
   In later case we must also care about conflicts with e.g. crash
   kernel regions which could be quite difficult.

2) We may allocate memory area statically somewhere in Xen code which
   could be used as memory pool for early dynamic allocations. Looks
   quite simple. Additionally, it would not depend on EFI at all and
   could be used on legacy BIOS platforms if we need it. However, we
   must carefully choose size of this pool. We do not want increase
   Xen binary size too much and waste too much memory but also we must fit
   at least memory map on x86 EFI platforms. As I saw on small machine,
   e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
   than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
   So, it means that we need more than 8 KiB for EFI memory map only.
   Additionally, if we want to use this memory pool for Xen and modules
   command line storage (it would be used when xen.efi is executed as EFI
   application) then we should add, I think, about 1 KiB. In this case,
   to be on safe side, we should assume at least 64 KiB pool for early
   memory allocations, which is about 4 times of our earlier calculations.
   However, during discussion on Xen-devel Jan Beulich suggested that
   just in case we should use 1 MiB memory pool like it was in original
   place_string() implementation. So, let's use 1 MiB as it was proposed.
   If we think that we should not waste unallocated memory in the pool
   on running system then we can mark this region as __initdata and move
   all required data to dynamically allocated places somewhere in __start_xen().

Now solution #2 is implemented but maybe we should consider #1 one day.

Jan Beulich added 1b) Do away with efi_arch_allocate_mmap_buffer() and use
   AllocatePages() uniformly, perhaps with a per-arch specified memory type
   (by means of which you can control whether the memory contents will remain
   preserved until the time you want to look at it). That will eliminate the
   only place_string() you're concerned about, with a patch with better
   diffstat (largely due to the questionable arch hook gone).

However, this solution does not solve conflicts problem described in #1
because EFI memory map is needed during Xen runtime after init phase.
So, finally we would get back to #1. Hmmm... Should I check how Linux
and others cope with that problem?

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
---
 xen/arch/x86/efi/efi-boot.h |   38 ++++++++++++++++++++++++++++++--------
 xen/arch/x86/setup.c        |    3 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

Comments

Jan Beulich May 25, 2016, 8:39 a.m. UTC | #1
>>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
> There is a problem with place_string() which is used as early memory
> allocator. It gets memory chunks starting from start symbol and
> going down. Sadly this does not work when Xen is loaded using multiboot2
> protocol because start lives on 1 MiB address. So, I tried to use
> mem_lower address calculated by GRUB2. However, it works only on some
> machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> which uses first ~640 KiB for boot services code or data... :-(((
> 
> In case of multiboot2 protocol we need that place_string() only allocate
> memory chunk for EFI memory map. However, I think that it should be fixed
> instead of making another function used just in one case. I thought about
> two solutions.
> 
> 1) We could use native EFI allocation functions (e.g. AllocatePool()
>    or AllocatePages()) to get memory chunk. However, later (somewhere
>    in __start_xen()) we must copy its contents to safe place or reserve
>    this in e820 memory map and map it in Xen virtual address space.
>    In later case we must also care about conflicts with e.g. crash
>    kernel regions which could be quite difficult.

I don't see why that would be: Simply use an allocation type that
doesn't lead to the area getting consumed as normal RAM. Nor do
I see the kexec collision potential. Furthermore (and I think I've
said so before) ARM is already using AllocatePool() - just with an
unsuitable memory type -, so doing so on x86 too would allow for
efi_arch_allocate_mmap_buffer() to go away.

> Jan Beulich added 1b) Do away with efi_arch_allocate_mmap_buffer() and use
>    AllocatePages() uniformly, perhaps with a per-arch specified memory type
>    (by means of which you can control whether the memory contents will remain
>    preserved until the time you want to look at it). That will eliminate the
>    only place_string() you're concerned about, with a patch with better
>    diffstat (largely due to the questionable arch hook gone).
> 
> However, this solution does not solve conflicts problem described in #1
> because EFI memory map is needed during Xen runtime after init phase.
> So, finally we would get back to #1. Hmmm... Should I check how Linux
> and others cope with that problem?

Ah, here you mention it actually. Yet you don't explain what conflict
potential you see once using EfiRuntimeServicesData for the allocation.

Jan
Daniel Kiper May 25, 2016, 7:48 p.m. UTC | #2
On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
> > There is a problem with place_string() which is used as early memory
> > allocator. It gets memory chunks starting from start symbol and
> > going down. Sadly this does not work when Xen is loaded using multiboot2
> > protocol because start lives on 1 MiB address. So, I tried to use
> > mem_lower address calculated by GRUB2. However, it works only on some
> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> > which uses first ~640 KiB for boot services code or data... :-(((
> >
> > In case of multiboot2 protocol we need that place_string() only allocate
> > memory chunk for EFI memory map. However, I think that it should be fixed
> > instead of making another function used just in one case. I thought about
> > two solutions.
> >
> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
> >    or AllocatePages()) to get memory chunk. However, later (somewhere
> >    in __start_xen()) we must copy its contents to safe place or reserve
> >    this in e820 memory map and map it in Xen virtual address space.
> >    In later case we must also care about conflicts with e.g. crash
> >    kernel regions which could be quite difficult.
>
> I don't see why that would be: Simply use an allocation type that
> doesn't lead to the area getting consumed as normal RAM. Nor do
> I see the kexec collision potential. Furthermore (and I think I've
> said so before) ARM is already using AllocatePool() - just with an
> unsuitable memory type -, so doing so on x86 too would allow for

Nope, they are using standard EfiLoaderData.

> efi_arch_allocate_mmap_buffer() to go away.

That would be great, so, I will think how to solve this issue.

> > Jan Beulich added 1b) Do away with efi_arch_allocate_mmap_buffer() and use
> >    AllocatePages() uniformly, perhaps with a per-arch specified memory type
> >    (by means of which you can control whether the memory contents will remain
> >    preserved until the time you want to look at it). That will eliminate the
> >    only place_string() you're concerned about, with a patch with better
> >    diffstat (largely due to the questionable arch hook gone).
> >
> > However, this solution does not solve conflicts problem described in #1
> > because EFI memory map is needed during Xen runtime after init phase.
> > So, finally we would get back to #1. Hmmm... Should I check how Linux
> > and others cope with that problem?
>
> Ah, here you mention it actually. Yet you don't explain what conflict
> potential you see once using EfiRuntimeServicesData for the allocation.

Good point! IMO, if crash kernel region conflicts with EfiRuntimeServices*
then we should display warning that it cannot be allocated. By the way,
once you mentioned that you have in your queue (I suppose that it is
extremely long) kdump patch which adds functionality to automatically
establish crash kernel region placement. I think that could solve (at
least partially) problem with conflicts. Could you post it?

Daniel
Jan Beulich May 27, 2016, 8:37 a.m. UTC | #3
>>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
> On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
>> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
>> > There is a problem with place_string() which is used as early memory
>> > allocator. It gets memory chunks starting from start symbol and
>> > going down. Sadly this does not work when Xen is loaded using multiboot2
>> > protocol because start lives on 1 MiB address. So, I tried to use
>> > mem_lower address calculated by GRUB2. However, it works only on some
>> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
>> > which uses first ~640 KiB for boot services code or data... :-(((
>> >
>> > In case of multiboot2 protocol we need that place_string() only allocate
>> > memory chunk for EFI memory map. However, I think that it should be fixed
>> > instead of making another function used just in one case. I thought about
>> > two solutions.
>> >
>> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >    or AllocatePages()) to get memory chunk. However, later (somewhere
>> >    in __start_xen()) we must copy its contents to safe place or reserve
>> >    this in e820 memory map and map it in Xen virtual address space.
>> >    In later case we must also care about conflicts with e.g. crash
>> >    kernel regions which could be quite difficult.
>>
>> I don't see why that would be: Simply use an allocation type that
>> doesn't lead to the area getting consumed as normal RAM. Nor do
>> I see the kexec collision potential. Furthermore (and I think I've
>> said so before) ARM is already using AllocatePool() - just with an
>> unsuitable memory type -, so doing so on x86 too would allow for
> 
> Nope, they are using standard EfiLoaderData.

Note how I said "just with an unsuitable memory type"?

>> > Jan Beulich added 1b) Do away with efi_arch_allocate_mmap_buffer() and use
>> >    AllocatePages() uniformly, perhaps with a per-arch specified memory type
>> >    (by means of which you can control whether the memory contents will remain
>> >    preserved until the time you want to look at it). That will eliminate the
>> >    only place_string() you're concerned about, with a patch with better
>> >    diffstat (largely due to the questionable arch hook gone).
>> >
>> > However, this solution does not solve conflicts problem described in #1
>> > because EFI memory map is needed during Xen runtime after init phase.
>> > So, finally we would get back to #1. Hmmm... Should I check how Linux
>> > and others cope with that problem?
>>
>> Ah, here you mention it actually. Yet you don't explain what conflict
>> potential you see once using EfiRuntimeServicesData for the allocation.
> 
> Good point! IMO, if crash kernel region conflicts with EfiRuntimeServices*
> then we should display warning that it cannot be allocated. By the way,
> once you mentioned that you have in your queue (I suppose that it is
> extremely long) kdump patch which adds functionality to automatically
> establish crash kernel region placement. I think that could solve (at
> least partially) problem with conflicts. Could you post it?

For one, unless asked to be at a specific location, we already dynamically
place that area. The patch that I believe you think of just enhances the
placement to have something between "fully dynamic" and "at a fixed
place". I've never posted it (or even ported it to -unstable) because I've
never got positive feedback on it by those who it was originally created
for. If you think it could be useful, I can certainly revive it.

Jan
Daniel Kiper June 1, 2016, 3:58 p.m. UTC | #4
On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
> >> > There is a problem with place_string() which is used as early memory
> >> > allocator. It gets memory chunks starting from start symbol and
> >> > going down. Sadly this does not work when Xen is loaded using multiboot2
> >> > protocol because start lives on 1 MiB address. So, I tried to use
> >> > mem_lower address calculated by GRUB2. However, it works only on some
> >> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> >> > which uses first ~640 KiB for boot services code or data... :-(((
> >> >
> >> > In case of multiboot2 protocol we need that place_string() only allocate
> >> > memory chunk for EFI memory map. However, I think that it should be fixed
> >> > instead of making another function used just in one case. I thought about
> >> > two solutions.
> >> >
> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
> >> >    or AllocatePages()) to get memory chunk. However, later (somewhere
> >> >    in __start_xen()) we must copy its contents to safe place or reserve
> >> >    this in e820 memory map and map it in Xen virtual address space.
> >> >    In later case we must also care about conflicts with e.g. crash
> >> >    kernel regions which could be quite difficult.
> >>
> >> I don't see why that would be: Simply use an allocation type that
> >> doesn't lead to the area getting consumed as normal RAM. Nor do
> >> I see the kexec collision potential. Furthermore (and I think I've
> >> said so before) ARM is already using AllocatePool() - just with an
> >> unsuitable memory type -, so doing so on x86 too would allow for
> >
> > Nope, they are using standard EfiLoaderData.
>
> Note how I said "just with an unsuitable memory type"?

Could you be more precise?

Daniel
Daniel Kiper June 1, 2016, 4:01 p.m. UTC | #5
On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:

[...]

> >> > Jan Beulich added 1b) Do away with efi_arch_allocate_mmap_buffer() and use
> >> >    AllocatePages() uniformly, perhaps with a per-arch specified memory type
> >> >    (by means of which you can control whether the memory contents will remain
> >> >    preserved until the time you want to look at it). That will eliminate the
> >> >    only place_string() you're concerned about, with a patch with better
> >> >    diffstat (largely due to the questionable arch hook gone).
> >> >
> >> > However, this solution does not solve conflicts problem described in #1
> >> > because EFI memory map is needed during Xen runtime after init phase.
> >> > So, finally we would get back to #1. Hmmm... Should I check how Linux
> >> > and others cope with that problem?
> >>
> >> Ah, here you mention it actually. Yet you don't explain what conflict
> >> potential you see once using EfiRuntimeServicesData for the allocation.
> >
> > Good point! IMO, if crash kernel region conflicts with EfiRuntimeServices*
> > then we should display warning that it cannot be allocated. By the way,
> > once you mentioned that you have in your queue (I suppose that it is
> > extremely long) kdump patch which adds functionality to automatically
> > establish crash kernel region placement. I think that could solve (at
> > least partially) problem with conflicts. Could you post it?
>
> For one, unless asked to be at a specific location, we already dynamically
> place that area. The patch that I believe you think of just enhances the

Hmmm... I do not know why I always thought that it is not supported in Xen.

> placement to have something between "fully dynamic" and "at a fixed
> place". I've never posted it (or even ported it to -unstable) because I've
> never got positive feedback on it by those who it was originally created
> for. If you think it could be useful, I can certainly revive it.

Once again, thanks for doing that.

Daniel
Jan Beulich June 1, 2016, 4:02 p.m. UTC | #6
>>> On 01.06.16 at 17:58, <daniel.kiper@oracle.com> wrote:
> On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
>> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
>> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
>> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
>> >> > There is a problem with place_string() which is used as early memory
>> >> > allocator. It gets memory chunks starting from start symbol and
>> >> > going down. Sadly this does not work when Xen is loaded using multiboot2
>> >> > protocol because start lives on 1 MiB address. So, I tried to use
>> >> > mem_lower address calculated by GRUB2. However, it works only on some
>> >> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
>> >> > which uses first ~640 KiB for boot services code or data... :-(((
>> >> >
>> >> > In case of multiboot2 protocol we need that place_string() only allocate
>> >> > memory chunk for EFI memory map. However, I think that it should be fixed
>> >> > instead of making another function used just in one case. I thought about
>> >> > two solutions.
>> >> >
>> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >> >    or AllocatePages()) to get memory chunk. However, later (somewhere
>> >> >    in __start_xen()) we must copy its contents to safe place or reserve
>> >> >    this in e820 memory map and map it in Xen virtual address space.
>> >> >    In later case we must also care about conflicts with e.g. crash
>> >> >    kernel regions which could be quite difficult.
>> >>
>> >> I don't see why that would be: Simply use an allocation type that
>> >> doesn't lead to the area getting consumed as normal RAM. Nor do
>> >> I see the kexec collision potential. Furthermore (and I think I've
>> >> said so before) ARM is already using AllocatePool() - just with an
>> >> unsuitable memory type -, so doing so on x86 too would allow for
>> >
>> > Nope, they are using standard EfiLoaderData.
>>
>> Note how I said "just with an unsuitable memory type"?
> 
> Could you be more precise?

What else do you need? Just have the arch specify the memory type to
be used (if ARM really _means_ to use that seemingly wrong type), and
make the rest of the code common.

Jan
Daniel Kiper June 1, 2016, 7:53 p.m. UTC | #7
On Wed, Jun 01, 2016 at 10:02:51AM -0600, Jan Beulich wrote:
> >>> On 01.06.16 at 17:58, <daniel.kiper@oracle.com> wrote:
> > On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
> >> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
> >> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
> >> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
> >> >> > There is a problem with place_string() which is used as early memory
> >> >> > allocator. It gets memory chunks starting from start symbol and
> >> >> > going down. Sadly this does not work when Xen is loaded using multiboot2
> >> >> > protocol because start lives on 1 MiB address. So, I tried to use
> >> >> > mem_lower address calculated by GRUB2. However, it works only on some
> >> >> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> >> >> > which uses first ~640 KiB for boot services code or data... :-(((
> >> >> >
> >> >> > In case of multiboot2 protocol we need that place_string() only allocate
> >> >> > memory chunk for EFI memory map. However, I think that it should be fixed
> >> >> > instead of making another function used just in one case. I thought about
> >> >> > two solutions.
> >> >> >
> >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
> >> >> >    or AllocatePages()) to get memory chunk. However, later (somewhere
> >> >> >    in __start_xen()) we must copy its contents to safe place or reserve
> >> >> >    this in e820 memory map and map it in Xen virtual address space.
> >> >> >    In later case we must also care about conflicts with e.g. crash
> >> >> >    kernel regions which could be quite difficult.
> >> >>
> >> >> I don't see why that would be: Simply use an allocation type that
> >> >> doesn't lead to the area getting consumed as normal RAM. Nor do
> >> >> I see the kexec collision potential. Furthermore (and I think I've
> >> >> said so before) ARM is already using AllocatePool() - just with an
> >> >> unsuitable memory type -, so doing so on x86 too would allow for
> >> >
> >> > Nope, they are using standard EfiLoaderData.
> >>
> >> Note how I said "just with an unsuitable memory type"?
> >
> > Could you be more precise?
>
> What else do you need? Just have the arch specify the memory type to
> be used (if ARM really _means_ to use that seemingly wrong type), and
> make the rest of the code common.

This is not the problem. I am not sure how do you understand "seemingly
wrong type". Anything outside of the UEFI spec? Or maybe
EfiReservedMemoryType or something like that?

Daniel
Jan Beulich June 2, 2016, 8:11 a.m. UTC | #8
>>> On 01.06.16 at 21:53, <daniel.kiper@oracle.com> wrote:
> On Wed, Jun 01, 2016 at 10:02:51AM -0600, Jan Beulich wrote:
>> >>> On 01.06.16 at 17:58, <daniel.kiper@oracle.com> wrote:
>> > On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
>> >> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
>> >> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
>> >> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
>> >> >> > There is a problem with place_string() which is used as early memory
>> >> >> > allocator. It gets memory chunks starting from start symbol and
>> >> >> > going down. Sadly this does not work when Xen is loaded using multiboot2
>> >> >> > protocol because start lives on 1 MiB address. So, I tried to use
>> >> >> > mem_lower address calculated by GRUB2. However, it works only on some
>> >> >> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
>> >> >> > which uses first ~640 KiB for boot services code or data... :-(((
>> >> >> >
>> >> >> > In case of multiboot2 protocol we need that place_string() only allocate
>> >> >> > memory chunk for EFI memory map. However, I think that it should be fixed
>> >> >> > instead of making another function used just in one case. I thought about
>> >> >> > two solutions.
>> >> >> >
>> >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >> >> >    or AllocatePages()) to get memory chunk. However, later (somewhere
>> >> >> >    in __start_xen()) we must copy its contents to safe place or reserve
>> >> >> >    this in e820 memory map and map it in Xen virtual address space.
>> >> >> >    In later case we must also care about conflicts with e.g. crash
>> >> >> >    kernel regions which could be quite difficult.
>> >> >>
>> >> >> I don't see why that would be: Simply use an allocation type that
>> >> >> doesn't lead to the area getting consumed as normal RAM. Nor do
>> >> >> I see the kexec collision potential. Furthermore (and I think I've
>> >> >> said so before) ARM is already using AllocatePool() - just with an
>> >> >> unsuitable memory type -, so doing so on x86 too would allow for
>> >> >
>> >> > Nope, they are using standard EfiLoaderData.
>> >>
>> >> Note how I said "just with an unsuitable memory type"?
>> >
>> > Could you be more precise?
>>
>> What else do you need? Just have the arch specify the memory type to
>> be used (if ARM really _means_ to use that seemingly wrong type), and
>> make the rest of the code common.
> 
> This is not the problem. I am not sure how do you understand "seemingly
> wrong type". Anything outside of the UEFI spec? Or maybe
> EfiReservedMemoryType or something like that?

No, the type is apparently wrong because quite likely, just like on
x86, they want the memory map to persist past ExitBootServices().
Sooner or later at least some parts of efi_init_memory() (or some
equivalent thereof) will be needed on ARM afaict, and accessing the
memory map at that time will require a change to the memory type.

Jan
Daniel Kiper June 2, 2016, 10:43 a.m. UTC | #9
On Thu, Jun 02, 2016 at 02:11:32AM -0600, Jan Beulich wrote:
> >>> On 01.06.16 at 21:53, <daniel.kiper@oracle.com> wrote:
> > On Wed, Jun 01, 2016 at 10:02:51AM -0600, Jan Beulich wrote:
> >> >>> On 01.06.16 at 17:58, <daniel.kiper@oracle.com> wrote:
> >> > On Fri, May 27, 2016 at 02:37:06AM -0600, Jan Beulich wrote:
> >> >> >>> On 25.05.16 at 21:48, <daniel.kiper@oracle.com> wrote:
> >> >> > On Wed, May 25, 2016 at 02:39:57AM -0600, Jan Beulich wrote:
> >> >> >> >>> On 15.04.16 at 14:33, <daniel.kiper@oracle.com> wrote:
> >> >> >> > There is a problem with place_string() which is used as early memory
> >> >> >> > allocator. It gets memory chunks starting from start symbol and
> >> >> >> > going down. Sadly this does not work when Xen is loaded using multiboot2
> >> >> >> > protocol because start lives on 1 MiB address. So, I tried to use
> >> >> >> > mem_lower address calculated by GRUB2. However, it works only on some
> >> >> >> > machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> >> >> >> > which uses first ~640 KiB for boot services code or data... :-(((
> >> >> >> >
> >> >> >> > In case of multiboot2 protocol we need that place_string() only allocate
> >> >> >> > memory chunk for EFI memory map. However, I think that it should be fixed
> >> >> >> > instead of making another function used just in one case. I thought about
> >> >> >> > two solutions.
> >> >> >> >
> >> >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool()
> >> >> >> >    or AllocatePages()) to get memory chunk. However, later (somewhere
> >> >> >> >    in __start_xen()) we must copy its contents to safe place or reserve
> >> >> >> >    this in e820 memory map and map it in Xen virtual address space.
> >> >> >> >    In later case we must also care about conflicts with e.g. crash
> >> >> >> >    kernel regions which could be quite difficult.
> >> >> >>
> >> >> >> I don't see why that would be: Simply use an allocation type that
> >> >> >> doesn't lead to the area getting consumed as normal RAM. Nor do
> >> >> >> I see the kexec collision potential. Furthermore (and I think I've
> >> >> >> said so before) ARM is already using AllocatePool() - just with an
> >> >> >> unsuitable memory type -, so doing so on x86 too would allow for
> >> >> >
> >> >> > Nope, they are using standard EfiLoaderData.
> >> >>
> >> >> Note how I said "just with an unsuitable memory type"?
> >> >
> >> > Could you be more precise?
> >>
> >> What else do you need? Just have the arch specify the memory type to
> >> be used (if ARM really _means_ to use that seemingly wrong type), and
> >> make the rest of the code common.
> >
> > This is not the problem. I am not sure how do you understand "seemingly
> > wrong type". Anything outside of the UEFI spec? Or maybe
> > EfiReservedMemoryType or something like that?
>
> No, the type is apparently wrong because quite likely, just like on
> x86, they want the memory map to persist past ExitBootServices().
> Sooner or later at least some parts of efi_init_memory() (or some
> equivalent thereof) will be needed on ARM afaict, and accessing the
> memory map at that time will require a change to the memory type.

I have checked the code once again. On ARM we allocate memory using
EfiLoaderData (not only for memory map) and later deliberately do not
take over these regions. This means that memory map persists. However,
this also means that we are not able to use a lot of memory which is
free from Xen point of view (for more details please check Unified
Extensible Firmware Interface Specification, Version 2.6, section 6.2,
Memory Allocation Services). Why? AFAICT, EfiLoaderCode/EfiLoaderData
types are used very often by EFI applications to allocate memory
dynamically. Most of these applications are dead after ExitBootServices().
So, there is pretty good chance that we are loosing quite big chunk
of memory which simply contains junk except of small portion with memory
map. And on ARM, especially on embedded, this could be painful.

On x86 we take over EfiLoaderCode/EfiLoaderData regions. And I do not
think we should change that behavior. Even maybe we should change
behavior on ARM. Hmmm... Is it possible? Was there a reason to not
take over EfiLoaderCode/EfiLoaderData regions on ARM?

Daniel
Jan Beulich June 2, 2016, 11:10 a.m. UTC | #10
>>> On 02.06.16 at 12:43, <daniel.kiper@oracle.com> wrote:
> I have checked the code once again. On ARM we allocate memory using
> EfiLoaderData (not only for memory map) and later deliberately do not
> take over these regions. This means that memory map persists. However,
> this also means that we are not able to use a lot of memory which is
> free from Xen point of view (for more details please check Unified
> Extensible Firmware Interface Specification, Version 2.6, section 6.2,
> Memory Allocation Services). Why? AFAICT, EfiLoaderCode/EfiLoaderData
> types are used very often by EFI applications to allocate memory
> dynamically. Most of these applications are dead after ExitBootServices().
> So, there is pretty good chance that we are loosing quite big chunk
> of memory which simply contains junk except of small portion with memory
> map. And on ARM, especially on embedded, this could be painful.
> 
> On x86 we take over EfiLoaderCode/EfiLoaderData regions. And I do not
> think we should change that behavior. Even maybe we should change
> behavior on ARM. Hmmm... Is it possible? Was there a reason to not
> take over EfiLoaderCode/EfiLoaderData regions on ARM?

That's not a question to me, I suppose (despite me being the only
one on the To list)?

Jan
Daniel Kiper July 5, 2016, 6:26 p.m. UTC | #11
On Fri, Apr 15, 2016 at 02:33:12PM +0200, Daniel Kiper wrote:
> There is a problem with place_string() which is used as early memory
> allocator. It gets memory chunks starting from start symbol and
> going down. Sadly this does not work when Xen is loaded using multiboot2
> protocol because start lives on 1 MiB address. So, I tried to use
> mem_lower address calculated by GRUB2. However, it works only on some
> machines. There are machines in the wild (e.g. Dell PowerEdge R820)
> which uses first ~640 KiB for boot services code or data... :-(((
>
> In case of multiboot2 protocol we need that place_string() only allocate
> memory chunk for EFI memory map. However, I think that it should be fixed
> instead of making another function used just in one case. I thought about
> two solutions.

I have done more experiments, read more code, etc. You can find results below.

> 1) We could use native EFI allocation functions (e.g. AllocatePool()
>    or AllocatePages()) to get memory chunk. However, later (somewhere
>    in __start_xen()) we must copy its contents to safe place or reserve
>    this in e820 memory map and map it in Xen virtual address space.

I have checked Linux kernel code. It allocates buffer for memory map using
EFI API and later reserve it in e820 memory map. Simple. This should work
for us too but...

>    In later case we must also care about conflicts with e.g. crash
>    kernel regions which could be quite difficult.

This is not a problem since Xen can choose dynamically placement of kdump
region during boot phase and there is no requirement to specify it in boot
command line. This means that it will avoid all allocated/reserved regions
including EFI memory map. However, there is one potential problem which
cannot be avoided simply with current EFI spec. I think about conflicts
with trampoline. It must live below 1 MiB. However, there is not something
similar to "AllocateMaxAddress" option for AllocatePages() which would
ask EFI to allocate memory above a given address (Hmmm... Why UEFI designers
did not added such option, e.g. "AllocateMinAddress"? For me it is obvious
thing if we have "AllocateMaxAddress"). So, it means that we cannot simply
say "give me a memory chunk above 1 MiB". AIUI, Linux guys do not care,
hope that all EFI platforms are smart and AllocatePages() tries hard to
avoid everything below 1 MiB. We can go this way too. However, I am almost
sure that sooner or later we will find crazy platforms which allocate memory
from 0-1 MiB region. We can avoid this by getting EFI memory map, looking for
free regions above 1 MiB and then trying to allocate memory chunk using
AllocatePages() with "AllocateAddress". Does it make sense?

> 2) We may allocate memory area statically somewhere in Xen code which
>    could be used as memory pool for early dynamic allocations. Looks
>    quite simple. Additionally, it would not depend on EFI at all and
>    could be used on legacy BIOS platforms if we need it. However, we
>    must carefully choose size of this pool. We do not want increase
>    Xen binary size too much and waste too much memory but also we must fit
>    at least memory map on x86 EFI platforms. As I saw on small machine,
>    e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
>    than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
>    So, it means that we need more than 8 KiB for EFI memory map only.
>    Additionally, if we want to use this memory pool for Xen and modules
>    command line storage (it would be used when xen.efi is executed as EFI
>    application) then we should add, I think, about 1 KiB. In this case,
>    to be on safe side, we should assume at least 64 KiB pool for early
>    memory allocations, which is about 4 times of our earlier calculations.
>    However, during discussion on Xen-devel Jan Beulich suggested that
>    just in case we should use 1 MiB memory pool like it was in original
>    place_string() implementation. So, let's use 1 MiB as it was proposed.
>    If we think that we should not waste unallocated memory in the pool
>    on running system then we can mark this region as __initdata and move
>    all required data to dynamically allocated places somewhere in __start_xen().

2a) We can create something like .init.bss and put this thing at the end of
    regular .bss section. Then allocate memory chunks starting from lowest
    address. After init phase we can free unused memory as in case of .init.text
    or .init.data sections. This way we do not need allocate any space in
    image file and freeing of unused memory should be simple. What do you
    think about that one?

Daniel
Jan Beulich July 6, 2016, 7:22 a.m. UTC | #12
>>> On 05.07.16 at 20:26, <daniel.kiper@oracle.com> wrote:
> On Fri, Apr 15, 2016 at 02:33:12PM +0200, Daniel Kiper wrote:
>> 1) We could use native EFI allocation functions (e.g. AllocatePool()
>>    or AllocatePages()) to get memory chunk. However, later (somewhere
>>    in __start_xen()) we must copy its contents to safe place or reserve
>>    this in e820 memory map and map it in Xen virtual address space.
> 
> I have checked Linux kernel code. It allocates buffer for memory map using
> EFI API and later reserve it in e820 memory map. Simple. This should work
> for us too but...
> 
>>    In later case we must also care about conflicts with e.g. crash
>>    kernel regions which could be quite difficult.
> 
> This is not a problem since Xen can choose dynamically placement of kdump
> region during boot phase and there is no requirement to specify it in boot
> command line. This means that it will avoid all allocated/reserved regions
> including EFI memory map. However, there is one potential problem which
> cannot be avoided simply with current EFI spec. I think about conflicts
> with trampoline. It must live below 1 MiB. However, there is not something
> similar to "AllocateMaxAddress" option for AllocatePages() which would
> ask EFI to allocate memory above a given address (Hmmm... Why UEFI designers
> did not added such option, e.g. "AllocateMinAddress"? For me it is obvious
> thing if we have "AllocateMaxAddress").

Not obvious to me at all. Allowing an upper bound is natural (for
both DMA purposes and arbitrary other addressing restrictions).
Allowing a lower bound to be specified isn't.

> So, it means that we cannot simply
> say "give me a memory chunk above 1 MiB". AIUI, Linux guys do not care,
> hope that all EFI platforms are smart and AllocatePages() tries hard to
> avoid everything below 1 MiB. We can go this way too. However, I am almost
> sure that sooner or later we will find crazy platforms which allocate memory
> from 0-1 MiB region. We can avoid this by getting EFI memory map, looking for
> free regions above 1 MiB and then trying to allocate memory chunk using
> AllocatePages() with "AllocateAddress". Does it make sense?

I don't see the point of all that, as I don't see why any EFI
implementation would want to deviate from the first line principle
of satisfying allocation requests as high as possible.

Apart from that using (only) EFI allocation mechanisms for
obtaining the trampoline area won't work anyway, as we already
know there are systems where all of the memory below 1Mb is
in use by EFI (mostly with boot kind allocations, i.e. becoming
available after ExitBootServices()).

>> 2) We may allocate memory area statically somewhere in Xen code which
>>    could be used as memory pool for early dynamic allocations. Looks
>>    quite simple. Additionally, it would not depend on EFI at all and
>>    could be used on legacy BIOS platforms if we need it. However, we
>>    must carefully choose size of this pool. We do not want increase
>>    Xen binary size too much and waste too much memory but also we must fit
>>    at least memory map on x86 EFI platforms. As I saw on small machine,
>>    e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
>>    than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
>>    So, it means that we need more than 8 KiB for EFI memory map only.
>>    Additionally, if we want to use this memory pool for Xen and modules
>>    command line storage (it would be used when xen.efi is executed as EFI
>>    application) then we should add, I think, about 1 KiB. In this case,
>>    to be on safe side, we should assume at least 64 KiB pool for early
>>    memory allocations, which is about 4 times of our earlier calculations.
>>    However, during discussion on Xen-devel Jan Beulich suggested that
>>    just in case we should use 1 MiB memory pool like it was in original
>>    place_string() implementation. So, let's use 1 MiB as it was proposed.
>>    If we think that we should not waste unallocated memory in the pool
>>    on running system then we can mark this region as __initdata and move
>>    all required data to dynamically allocated places somewhere in __start_xen().
> 
> 2a) We can create something like .init.bss and put this thing at the end of
>     regular .bss section. Then allocate memory chunks starting from lowest
>     address. After init phase we can free unused memory as in case of .init.text
>     or .init.data sections. This way we do not need allocate any space in
>     image file and freeing of unused memory should be simple. What do you
>     think about that one?

With (again) the caveat of how to size such a region.

Bottom line - I continue to be unconvinced that we need something
"new" here at all.

Jan
Daniel Kiper July 6, 2016, 11:15 a.m. UTC | #13
On Wed, Jul 06, 2016 at 01:22:24AM -0600, Jan Beulich wrote:
> >>> On 05.07.16 at 20:26, <daniel.kiper@oracle.com> wrote:
> > On Fri, Apr 15, 2016 at 02:33:12PM +0200, Daniel Kiper wrote:
> >> 1) We could use native EFI allocation functions (e.g. AllocatePool()
> >>    or AllocatePages()) to get memory chunk. However, later (somewhere
> >>    in __start_xen()) we must copy its contents to safe place or reserve
> >>    this in e820 memory map and map it in Xen virtual address space.
> >
> > I have checked Linux kernel code. It allocates buffer for memory map using
> > EFI API and later reserve it in e820 memory map. Simple. This should work
> > for us too but...
> >
> >>    In later case we must also care about conflicts with e.g. crash
> >>    kernel regions which could be quite difficult.
> >
> > This is not a problem since Xen can choose dynamically placement of kdump
> > region during boot phase and there is no requirement to specify it in boot
> > command line. This means that it will avoid all allocated/reserved regions
> > including EFI memory map. However, there is one potential problem which
> > cannot be avoided simply with current EFI spec. I think about conflicts
> > with trampoline. It must live below 1 MiB. However, there is not something
> > similar to "AllocateMaxAddress" option for AllocatePages() which would
> > ask EFI to allocate memory above a given address (Hmmm... Why UEFI designers
> > did not added such option, e.g. "AllocateMinAddress"? For me it is obvious
> > thing if we have "AllocateMaxAddress").
>
> Not obvious to me at all. Allowing an upper bound is natural (for
> both DMA purposes and arbitrary other addressing restrictions).
> Allowing a lower bound to be specified isn't.

I think that I have shown above that on some platforms this could be useful option.

> > So, it means that we cannot simply
> > say "give me a memory chunk above 1 MiB". AIUI, Linux guys do not care,
> > hope that all EFI platforms are smart and AllocatePages() tries hard to
> > avoid everything below 1 MiB. We can go this way too. However, I am almost
> > sure that sooner or later we will find crazy platforms which allocate memory
> > from 0-1 MiB region. We can avoid this by getting EFI memory map, looking for
> > free regions above 1 MiB and then trying to allocate memory chunk using
> > AllocatePages() with "AllocateAddress". Does it make sense?
>
> I don't see the point of all that, as I don't see why any EFI
> implementation would want to deviate from the first line principle
> of satisfying allocation requests as high as possible.

In general this is good idea. However, I have not seen such requirement in
UEFI spec. So, I suppose that bad things may happen on some EFI implementations
and that is why I proposed a bit "smarter" approach. On the other hand if Linux
does allocations in "simple" way (just AllocatePages() call) then it means that
this solution works on most platforms if not all. So, probably "simple" solution
would work for us too.

Anyway, I think that it is worth considering all potential issues if we are
aware of them in advance.

> Apart from that using (only) EFI allocation mechanisms for
> obtaining the trampoline area won't work anyway, as we already
> know there are systems where all of the memory below 1Mb is
> in use by EFI (mostly with boot kind allocations, i.e. becoming
> available after ExitBootServices()).

I know about that. However, I am talking here about memory allocation
for EFI memory map. As I said above this region may potentially (well,
it looks that probability is low but as I said earlier we should think
and discuss this issue here) conflict with trampoline region. Though
I am not saying how this region (for trampoline) should be allocated
because current solution works well.

> >> 2) We may allocate memory area statically somewhere in Xen code which
> >>    could be used as memory pool for early dynamic allocations. Looks
> >>    quite simple. Additionally, it would not depend on EFI at all and
> >>    could be used on legacy BIOS platforms if we need it. However, we
> >>    must carefully choose size of this pool. We do not want increase
> >>    Xen binary size too much and waste too much memory but also we must fit
> >>    at least memory map on x86 EFI platforms. As I saw on small machine,
> >>    e.g. IBM System x3550 M2 with 8 GiB RAM, memory map may contain more
> >>    than 200 entries. Every entry on x86-64 platform is 40 bytes in size.
> >>    So, it means that we need more than 8 KiB for EFI memory map only.
> >>    Additionally, if we want to use this memory pool for Xen and modules
> >>    command line storage (it would be used when xen.efi is executed as EFI
> >>    application) then we should add, I think, about 1 KiB. In this case,
> >>    to be on safe side, we should assume at least 64 KiB pool for early
> >>    memory allocations, which is about 4 times of our earlier calculations.
> >>    However, during discussion on Xen-devel Jan Beulich suggested that
> >>    just in case we should use 1 MiB memory pool like it was in original
> >>    place_string() implementation. So, let's use 1 MiB as it was proposed.
> >>    If we think that we should not waste unallocated memory in the pool
> >>    on running system then we can mark this region as __initdata and move
> >>    all required data to dynamically allocated places somewhere in __start_xen().
> >
> > 2a) We can create something like .init.bss and put this thing at the end of
> >     regular .bss section. Then allocate memory chunks starting from lowest
> >     address. After init phase we can free unused memory as in case of .init.text
> >     or .init.data sections. This way we do not need allocate any space in
> >     image file and freeing of unused memory should be simple. What do you
> >     think about that one?
>
> With (again) the caveat of how to size such a region.

Yep, you are right.

> Bottom line - I continue to be unconvinced that we need something
> "new" here at all.

Why? I think that I have shown all currently existing issues related
to place_string() usage. Am I missing something? And please remember
that we are talking here about allocating memory for EFI memory map
not for trampoline.

Daniel
diff mbox

Patch

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 6dbb14d..84afffa 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -103,9 +103,36 @@  static void __init relocate_trampoline(unsigned long phys)
         *(u16 *)(*trampoline_ptr + (long)trampoline_ptr) = phys >> 4;
 }
 
+#define EBMALLOC_SIZE	MB(1)
+
+static char __initdata ebmalloc_mem[EBMALLOC_SIZE];
+static char __initdata *ebmalloc_free = NULL;
+
+/* EFI boot allocator. */
+static void __init *ebmalloc(size_t size)
+{
+    void *ptr;
+
+    /*
+     * Init ebmalloc_free on runtime. Static initialization
+     * will not work because it puts virtual address there.
+     */
+    if ( ebmalloc_free == NULL )
+        ebmalloc_free = ebmalloc_mem;
+
+    ptr = ebmalloc_free;
+
+    ebmalloc_free += size;
+
+    if ( ebmalloc_free - ebmalloc_mem > sizeof(ebmalloc_mem) )
+        blexit(L"Out of static memory\r\n");
+
+    return ptr;
+}
+
 static void __init place_string(u32 *addr, const char *s)
 {
-    static char *__initdata alloc = start;
+    char *alloc = NULL;
 
     if ( s && *s )
     {
@@ -113,7 +140,7 @@  static void __init place_string(u32 *addr, const char *s)
         const char *old = (char *)(long)*addr;
         size_t len2 = *addr ? strlen(old) + 1 : 0;
 
-        alloc -= len1 + len2;
+        alloc = ebmalloc(len1 + len2);
         /*
          * Insert new string before already existing one. This is needed
          * for options passed on the command line to override options from
@@ -196,12 +223,7 @@  static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 
 static void *__init efi_arch_allocate_mmap_buffer(UINTN map_size)
 {
-    place_string(&mbi.mem_upper, NULL);
-    mbi.mem_upper -= map_size;
-    mbi.mem_upper &= -__alignof__(EFI_MEMORY_DESCRIPTOR);
-    if ( mbi.mem_upper < xen_phys_start )
-        return NULL;
-    return (void *)(long)mbi.mem_upper;
+    return ebmalloc(map_size);
 }
 
 static void __init efi_arch_pre_exit_boot(void)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 5d80868..4eb8572 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1057,8 +1057,7 @@  void __init noreturn __start_xen(unsigned long mbi_p)
 
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen.");
-    reserve_e820_ram(&boot_e820, efi_enabled(EFI_PLATFORM) ? mbi->mem_upper : __pa(&_start),
-                     __pa(&_end));
+    reserve_e820_ram(&boot_e820, __pa(&_start), __pa(&_end));
 
     /* Late kexec reservation (dynamic start address). */
     kexec_reserve_area(&boot_e820);