diff mbox

[2/2] efi/boot: Don't free ebmalloc area at all

Message ID 1488295218-26057-2-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Feb. 28, 2017, 3:20 p.m. UTC
Freeing part of the BSS back for general use proves to be problematic.  It is
not accounted for in xen_in_range(), causing errors when constructing the
IOMMU tables, resulting in a failure to boot.

Other smaller issues are that tboot treats the entire BSS as hypervisor data,
creating and checking a MAC of it on S3, and that, by being 1MB in size,
freeing it guarentees to shatter the hypervisor superpage mappings.

Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
more-reasonable 32kB and keep the entire buffer around after boot.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/common/efi/boot.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

Comments

Jan Beulich Feb. 28, 2017, 4:03 p.m. UTC | #1
>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> Freeing part of the BSS back for general use proves to be problematic.  It is
> not accounted for in xen_in_range(), causing errors when constructing the
> IOMMU tables, resulting in a failure to boot.
> 
> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> freeing it guarentees to shatter the hypervisor superpage mappings.
> 
> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> more-reasonable 32kB and keep the entire buffer around after boot.

Well, that's just because right now there's only a single user. The
reason I refused Daniel making it smaller than its predecessor is
that we can't really give a good estimate of how much data may
need storing there: The memory map can have hundreds of entries
and command lines for modules may also be almost arbitrarily long.

What I don't recall, Daniel: Why was it that we can't use EFI boot
services allocations here?

Jan
Andrew Cooper Feb. 28, 2017, 4:08 p.m. UTC | #2
On 28/02/17 16:03, Jan Beulich wrote:
>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>> Freeing part of the BSS back for general use proves to be problematic.  It is
>> not accounted for in xen_in_range(), causing errors when constructing the
>> IOMMU tables, resulting in a failure to boot.
>>
>> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>> freeing it guarentees to shatter the hypervisor superpage mappings.
>>
>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>> more-reasonable 32kB and keep the entire buffer around after boot.
> Well, that's just because right now there's only a single user. The
> reason I refused Daniel making it smaller than its predecessor is
> that we can't really give a good estimate of how much data may
> need storing there: The memory map can have hundreds of entries
> and command lines for modules may also be almost arbitrarily long.
>
> What I don't recall, Daniel: Why was it that we can't use EFI boot
> services allocations here?

From the original commit message,

    1) We could use native EFI allocation functions (e.g. AllocatePool()
       or AllocatePages()) to get memory chunk. However, later (somewhere
       in __start_xen()) we must copy its contents to safe place or reserve
       it in e820 memory map and map it in Xen virtual address space. This
       means that the code referring to Xen command line, loaded modules and
       EFI memory map, mostly in __start_xen(), will be further complicated
       and diverge from legacy BIOS cases. Additionally, both former things
       have to be placed below 4 GiB because their addresses are stored in
       multiboot_info_t structure which has 32-bit relevant members.


One way or another, if we don't want to shatter superpages, we either
must keep the entire allocation, or copy the content out later into a
smaller location once other heap facilities are available.

If we are copying data out, we might as well use EFI heap facilities
rather than rolling our own.

~Andrew
Jan Beulich Feb. 28, 2017, 4:14 p.m. UTC | #3
>>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
> On 28/02/17 16:03, Jan Beulich wrote:
>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>>> Freeing part of the BSS back for general use proves to be problematic.  It 
> is
>>> not accounted for in xen_in_range(), causing errors when constructing the
>>> IOMMU tables, resulting in a failure to boot.
>>>
>>> Other smaller issues are that tboot treats the entire BSS as hypervisor 
> data,
>>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>>> freeing it guarentees to shatter the hypervisor superpage mappings.
>>>
>>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>>> more-reasonable 32kB and keep the entire buffer around after boot.
>> Well, that's just because right now there's only a single user. The
>> reason I refused Daniel making it smaller than its predecessor is
>> that we can't really give a good estimate of how much data may
>> need storing there: The memory map can have hundreds of entries
>> and command lines for modules may also be almost arbitrarily long.
>>
>> What I don't recall, Daniel: Why was it that we can't use EFI boot
>> services allocations here?
> 
> From the original commit message,
> 
>     1) We could use native EFI allocation functions (e.g. AllocatePool()
>        or AllocatePages()) to get memory chunk. However, later (somewhere
>        in __start_xen()) we must copy its contents to safe place or reserve
>        it in e820 memory map and map it in Xen virtual address space.

Reading this again, I have to admit that I don't understand why any
copying or reserving would need to be done. We'd need to do runtime
allocations, sure, but I would have thought this goes without saying.

> This
>        means that the code referring to Xen command line, loaded modules and
>        EFI memory map, mostly in __start_xen(), will be further complicated
>        and diverge from legacy BIOS cases. Additionally, both former things
>        have to be placed below 4 GiB because their addresses are stored in
>        multiboot_info_t structure which has 32-bit relevant members.
> 
> 
> One way or another, if we don't want to shatter superpages, we either
> must keep the entire allocation, or copy the content out later into a
> smaller location once other heap facilities are available.
> 
> If we are copying data out, we might as well use EFI heap facilities
> rather than rolling our own.

Well, copying data later won't work, as there are pointers stored to
the original allocation.

Jan
Daniel Kiper Feb. 28, 2017, 5:09 p.m. UTC | #4
On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
> > On 28/02/17 16:03, Jan Beulich wrote:
> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >>> Freeing part of the BSS back for general use proves to be problematic.  It
> > is
> >>> not accounted for in xen_in_range(), causing errors when constructing the
> >>> IOMMU tables, resulting in a failure to boot.
> >>>
> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
> > data,
> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
> >>>
> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >>> more-reasonable 32kB and keep the entire buffer around after boot.
> >> Well, that's just because right now there's only a single user. The
> >> reason I refused Daniel making it smaller than its predecessor is
> >> that we can't really give a good estimate of how much data may
> >> need storing there: The memory map can have hundreds of entries
> >> and command lines for modules may also be almost arbitrarily long.
> >>
> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
> >> services allocations here?
> >
> > From the original commit message,
> >
> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
> >        or AllocatePages()) to get memory chunk. However, later (somewhere
> >        in __start_xen()) we must copy its contents to safe place or reserve
> >        it in e820 memory map and map it in Xen virtual address space.
>
> Reading this again, I have to admit that I don't understand why any
> copying or reserving would need to be done. We'd need to do runtime
> allocations, sure, but I would have thought this goes without saying.

We discussed this once but I do not remember the thread. In general Xen EFI
boot code should allocate memory as EfiLoaderData. However, later in
efi_arch_process_memory_map() we treat this memory type as free. This means
that it can be overwritten. So, that is why I mentioned copy or reservation.

> > This
> >        means that the code referring to Xen command line, loaded modules and
> >        EFI memory map, mostly in __start_xen(), will be further complicated
> >        and diverge from legacy BIOS cases. Additionally, both former things
> >        have to be placed below 4 GiB because their addresses are stored in
> >        multiboot_info_t structure which has 32-bit relevant members.
> >
> >
> > One way or another, if we don't want to shatter superpages, we either
> > must keep the entire allocation, or copy the content out later into a
> > smaller location once other heap facilities are available.
> >
> > If we are copying data out, we might as well use EFI heap facilities
> > rather than rolling our own.
>
> Well, copying data later won't work, as there are pointers stored to
> the original allocation.

This is not impossible. Just requires fiddling with mbi struct.

Daniel
Jan Beulich Feb. 28, 2017, 5:24 p.m. UTC | #5
>>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote:
> On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
>> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
>> > On 28/02/17 16:03, Jan Beulich wrote:
>> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>> >>> Freeing part of the BSS back for general use proves to be problematic.  It
>> > is
>> >>> not accounted for in xen_in_range(), causing errors when constructing the
>> >>> IOMMU tables, resulting in a failure to boot.
>> >>>
>> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
>> > data,
>> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
>> >>>
>> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>> >>> more-reasonable 32kB and keep the entire buffer around after boot.
>> >> Well, that's just because right now there's only a single user. The
>> >> reason I refused Daniel making it smaller than its predecessor is
>> >> that we can't really give a good estimate of how much data may
>> >> need storing there: The memory map can have hundreds of entries
>> >> and command lines for modules may also be almost arbitrarily long.
>> >>
>> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
>> >> services allocations here?
>> >
>> > From the original commit message,
>> >
>> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >        or AllocatePages()) to get memory chunk. However, later (somewhere
>> >        in __start_xen()) we must copy its contents to safe place or reserve
>> >        it in e820 memory map and map it in Xen virtual address space.
>>
>> Reading this again, I have to admit that I don't understand why any
>> copying or reserving would need to be done. We'd need to do runtime
>> allocations, sure, but I would have thought this goes without saying.
> 
> We discussed this once but I do not remember the thread. In general Xen EFI
> boot code should allocate memory as EfiLoaderData. However, later in
> efi_arch_process_memory_map() we treat this memory type as free. This means
> that it can be overwritten. So, that is why I mentioned copy or reservation.

There's nothing disallowing runtime data allocations, afaik.

>> > This
>> >        means that the code referring to Xen command line, loaded modules and
>> >        EFI memory map, mostly in __start_xen(), will be further complicated
>> >        and diverge from legacy BIOS cases. Additionally, both former things
>> >        have to be placed below 4 GiB because their addresses are stored in
>> >        multiboot_info_t structure which has 32-bit relevant members.
>> >
>> >
>> > One way or another, if we don't want to shatter superpages, we either
>> > must keep the entire allocation, or copy the content out later into a
>> > smaller location once other heap facilities are available.
>> >
>> > If we are copying data out, we might as well use EFI heap facilities
>> > rather than rolling our own.
>>
>> Well, copying data later won't work, as there are pointers stored to
>> the original allocation.
> 
> This is not impossible. Just requires fiddling with mbi struct.

That would cover this one specific use only, and even then how
do you know there weren't any derived pointers stored elsewhere?

Jan
Daniel Kiper Feb. 28, 2017, 5:41 p.m. UTC | #6
On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote:
> On 28/02/17 16:03, Jan Beulich wrote:
> >>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >> Freeing part of the BSS back for general use proves to be problematic.  It is
> >> not accounted for in xen_in_range(), causing errors when constructing the
> >> IOMMU tables, resulting in a failure to boot.
> >>
> >> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
> >> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >> freeing it guarentees to shatter the hypervisor superpage mappings.
> >>
> >> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >> more-reasonable 32kB and keep the entire buffer around after boot.
> > Well, that's just because right now there's only a single user. The
> > reason I refused Daniel making it smaller than its predecessor is
> > that we can't really give a good estimate of how much data may
> > need storing there: The memory map can have hundreds of entries
> > and command lines for modules may also be almost arbitrarily long.
> >
> > What I don't recall, Daniel: Why was it that we can't use EFI boot
> > services allocations here?
>
> From the original commit message,
>
>     1) We could use native EFI allocation functions (e.g. AllocatePool()
>        or AllocatePages()) to get memory chunk. However, later (somewhere
>        in __start_xen()) we must copy its contents to safe place or reserve
>        it in e820 memory map and map it in Xen virtual address space. This
>        means that the code referring to Xen command line, loaded modules and
>        EFI memory map, mostly in __start_xen(), will be further complicated
>        and diverge from legacy BIOS cases. Additionally, both former things
>        have to be placed below 4 GiB because their addresses are stored in
>        multiboot_info_t structure which has 32-bit relevant members.
>
>
> One way or another, if we don't want to shatter superpages, we either
> must keep the entire allocation, or copy the content out later into a
> smaller location once other heap facilities are available.

Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not?

In regards to tboot I think that we should just take into account during
calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem
region to the end of BSS would help somehow here.

Daniel
Andrew Cooper Feb. 28, 2017, 5:58 p.m. UTC | #7
On 28/02/17 17:41, Daniel Kiper wrote:
> On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote:
>> On 28/02/17 16:03, Jan Beulich wrote:
>>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>>>> Freeing part of the BSS back for general use proves to be problematic.  It is
>>>> not accounted for in xen_in_range(), causing errors when constructing the
>>>> IOMMU tables, resulting in a failure to boot.
>>>>
>>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
>>>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>>>> freeing it guarentees to shatter the hypervisor superpage mappings.
>>>>
>>>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>>>> more-reasonable 32kB and keep the entire buffer around after boot.
>>> Well, that's just because right now there's only a single user. The
>>> reason I refused Daniel making it smaller than its predecessor is
>>> that we can't really give a good estimate of how much data may
>>> need storing there: The memory map can have hundreds of entries
>>> and command lines for modules may also be almost arbitrarily long.
>>>
>>> What I don't recall, Daniel: Why was it that we can't use EFI boot
>>> services allocations here?
>> From the original commit message,
>>
>>     1) We could use native EFI allocation functions (e.g. AllocatePool()
>>        or AllocatePages()) to get memory chunk. However, later (somewhere
>>        in __start_xen()) we must copy its contents to safe place or reserve
>>        it in e820 memory map and map it in Xen virtual address space. This
>>        means that the code referring to Xen command line, loaded modules and
>>        EFI memory map, mostly in __start_xen(), will be further complicated
>>        and diverge from legacy BIOS cases. Additionally, both former things
>>        have to be placed below 4 GiB because their addresses are stored in
>>        multiboot_info_t structure which has 32-bit relevant members.
>>
>>
>> One way or another, if we don't want to shatter superpages, we either
>> must keep the entire allocation, or copy the content out later into a
>> smaller location once other heap facilities are available.
> Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not?

Xen is purposefully laid out like this:

.text, 2M aligned, R/X
.rodata, 2M aligned, R/NX
.init.*, 2M aligned, R/W/X (reclaimed)
.data + .bss, 2M aligned, R/W/NX

(In reality there is a syslinux bug which caused me to revert the 2M
alignment in non-EFI builds, so we are still using 4k alignment, but I
need to find a way to work around this and move back to enforcing the 2M
alignment.)

When .init gets reclaimed, this leaves a (deliberate) hole which is all
2M aligned, which has no impact on the adjacent 2M superpages.

When ebmalloc gets reclaimed, it must shatter one or two superpages
mapping the .data/.bss section.

> In regards to tboot I think that we should just take into account during
> calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem
> region to the end of BSS would help somehow here.

At the moment, all the ranges of Xen VA space are bounded by linker symbols.

It is certainly possible to could account for the ebmalloc object in the
middle of the BSS, but it is turning into a layering violation.

~Andrew
Daniel Kiper Feb. 28, 2017, 6:45 p.m. UTC | #8
On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote:
> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote:
> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
> >> > On 28/02/17 16:03, Jan Beulich wrote:
> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >> >>> Freeing part of the BSS back for general use proves to be problematic.  It
> >> > is
> >> >>> not accounted for in xen_in_range(), causing errors when constructing the
> >> >>> IOMMU tables, resulting in a failure to boot.
> >> >>>
> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
> >> > data,
> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
> >> >>>
> >> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >> >>> more-reasonable 32kB and keep the entire buffer around after boot.
> >> >> Well, that's just because right now there's only a single user. The
> >> >> reason I refused Daniel making it smaller than its predecessor is
> >> >> that we can't really give a good estimate of how much data may
> >> >> need storing there: The memory map can have hundreds of entries
> >> >> and command lines for modules may also be almost arbitrarily long.
> >> >>
> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
> >> >> services allocations here?
> >> >
> >> > From the original commit message,
> >> >
> >> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
> >> >        or AllocatePages()) to get memory chunk. However, later (somewhere
> >> >        in __start_xen()) we must copy its contents to safe place or reserve
> >> >        it in e820 memory map and map it in Xen virtual address space.
> >>
> >> Reading this again, I have to admit that I don't understand why any
> >> copying or reserving would need to be done. We'd need to do runtime
> >> allocations, sure, but I would have thought this goes without saying.
> >
> > We discussed this once but I do not remember the thread. In general Xen EFI
> > boot code should allocate memory as EfiLoaderData. However, later in
> > efi_arch_process_memory_map() we treat this memory type as free. This means
> > that it can be overwritten. So, that is why I mentioned copy or reservation.
>
> There's nothing disallowing runtime data allocations, afaik.

Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO.
AIUI, this is not intended to use in that way.

> >> > This
> >> >        means that the code referring to Xen command line, loaded modules and
> >> >        EFI memory map, mostly in __start_xen(), will be further complicated
> >> >        and diverge from legacy BIOS cases. Additionally, both former things
> >> >        have to be placed below 4 GiB because their addresses are stored in
> >> >        multiboot_info_t structure which has 32-bit relevant members.
> >> >
> >> >
> >> > One way or another, if we don't want to shatter superpages, we either
> >> > must keep the entire allocation, or copy the content out later into a
> >> > smaller location once other heap facilities are available.
> >> >
> >> > If we are copying data out, we might as well use EFI heap facilities
> >> > rather than rolling our own.
> >>
> >> Well, copying data later won't work, as there are pointers stored to
> >> the original allocation.
> >
> > This is not impossible. Just requires fiddling with mbi struct.
>
> That would cover this one specific use only, and even then how
> do you know there weren't any derived pointers stored elsewhere?

Right, we should avoid it. It is too fragile approach.

Daniel
Daniel Kiper Feb. 28, 2017, 7:01 p.m. UTC | #9
On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote:
> On 28/02/17 17:41, Daniel Kiper wrote:
> > On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote:
> >> On 28/02/17 16:03, Jan Beulich wrote:
> >>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >>>> Freeing part of the BSS back for general use proves to be problematic.  It is
> >>>> not accounted for in xen_in_range(), causing errors when constructing the
> >>>> IOMMU tables, resulting in a failure to boot.
> >>>>
> >>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
> >>>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >>>> freeing it guarentees to shatter the hypervisor superpage mappings.
> >>>>
> >>>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >>>> more-reasonable 32kB and keep the entire buffer around after boot.
> >>> Well, that's just because right now there's only a single user. The
> >>> reason I refused Daniel making it smaller than its predecessor is
> >>> that we can't really give a good estimate of how much data may
> >>> need storing there: The memory map can have hundreds of entries
> >>> and command lines for modules may also be almost arbitrarily long.
> >>>
> >>> What I don't recall, Daniel: Why was it that we can't use EFI boot
> >>> services allocations here?
> >> From the original commit message,
> >>
> >>     1) We could use native EFI allocation functions (e.g. AllocatePool()
> >>        or AllocatePages()) to get memory chunk. However, later (somewhere
> >>        in __start_xen()) we must copy its contents to safe place or reserve
> >>        it in e820 memory map and map it in Xen virtual address space. This
> >>        means that the code referring to Xen command line, loaded modules and
> >>        EFI memory map, mostly in __start_xen(), will be further complicated
> >>        and diverge from legacy BIOS cases. Additionally, both former things
> >>        have to be placed below 4 GiB because their addresses are stored in
> >>        multiboot_info_t structure which has 32-bit relevant members.
> >>
> >>
> >> One way or another, if we don't want to shatter superpages, we either
> >> must keep the entire allocation, or copy the content out later into a
> >> smaller location once other heap facilities are available.
> > Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not?
>
> Xen is purposefully laid out like this:
>
> .text, 2M aligned, R/X
> .rodata, 2M aligned, R/NX
> .init.*, 2M aligned, R/W/X (reclaimed)
> .data + .bss, 2M aligned, R/W/NX

AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX)
sections. Right?

> (In reality there is a syslinux bug which caused me to revert the 2M
> alignment in non-EFI builds, so we are still using 4k alignment, but I
> need to find a way to work around this and move back to enforcing the 2M
> alignment.)
>
> When .init gets reclaimed, this leaves a (deliberate) hole which is all
> 2M aligned, which has no impact on the adjacent 2M superpages.
>
> When ebmalloc gets reclaimed, it must shatter one or two superpages
> mapping the .data/.bss section.

Looking at this I do not have a better idea for fix off the top of my head.
So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave
comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region.
Should I post a patch or whether you would like to do it?

> > In regards to tboot I think that we should just take into account during
> > calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem
> > region to the end of BSS would help somehow here.
>
> At the moment, all the ranges of Xen VA space are bounded by linker symbols.
>
> It is certainly possible to could account for the ebmalloc object in the
> middle of the BSS, but it is turning into a layering violation.

I am not sure what do you mean by "layering violation".

Daniel
Andrew Cooper Feb. 28, 2017, 7:09 p.m. UTC | #10
On 28/02/17 19:01, Daniel Kiper wrote:
> On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote:
>> On 28/02/17 17:41, Daniel Kiper wrote:
>>> On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote:
>>>> On 28/02/17 16:03, Jan Beulich wrote:
>>>>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>>>>>> Freeing part of the BSS back for general use proves to be problematic.  It is
>>>>>> not accounted for in xen_in_range(), causing errors when constructing the
>>>>>> IOMMU tables, resulting in a failure to boot.
>>>>>>
>>>>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
>>>>>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>>>>>> freeing it guarentees to shatter the hypervisor superpage mappings.
>>>>>>
>>>>>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>>>>>> more-reasonable 32kB and keep the entire buffer around after boot.
>>>>> Well, that's just because right now there's only a single user. The
>>>>> reason I refused Daniel making it smaller than its predecessor is
>>>>> that we can't really give a good estimate of how much data may
>>>>> need storing there: The memory map can have hundreds of entries
>>>>> and command lines for modules may also be almost arbitrarily long.
>>>>>
>>>>> What I don't recall, Daniel: Why was it that we can't use EFI boot
>>>>> services allocations here?
>>>> From the original commit message,
>>>>
>>>>     1) We could use native EFI allocation functions (e.g. AllocatePool()
>>>>        or AllocatePages()) to get memory chunk. However, later (somewhere
>>>>        in __start_xen()) we must copy its contents to safe place or reserve
>>>>        it in e820 memory map and map it in Xen virtual address space. This
>>>>        means that the code referring to Xen command line, loaded modules and
>>>>        EFI memory map, mostly in __start_xen(), will be further complicated
>>>>        and diverge from legacy BIOS cases. Additionally, both former things
>>>>        have to be placed below 4 GiB because their addresses are stored in
>>>>        multiboot_info_t structure which has 32-bit relevant members.
>>>>
>>>>
>>>> One way or another, if we don't want to shatter superpages, we either
>>>> must keep the entire allocation, or copy the content out later into a
>>>> smaller location once other heap facilities are available.
>>> Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not?
>> Xen is purposefully laid out like this:
>>
>> .text, 2M aligned, R/X
>> .rodata, 2M aligned, R/NX
>> .init.*, 2M aligned, R/W/X (reclaimed)
>> .data + .bss, 2M aligned, R/W/NX
> AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX)
> sections. Right?

When I did the original work, it was both to get proper pagetable
permissions, and to get all of the compiled bits of Xen living in 2M
superpages (which allows the entire hypervisor to operate in 5 TLB entries!)

>
>> (In reality there is a syslinux bug which caused me to revert the 2M
>> alignment in non-EFI builds, so we are still using 4k alignment, but I
>> need to find a way to work around this and move back to enforcing the 2M
>> alignment.)
>>
>> When .init gets reclaimed, this leaves a (deliberate) hole which is all
>> 2M aligned, which has no impact on the adjacent 2M superpages.
>>
>> When ebmalloc gets reclaimed, it must shatter one or two superpages
>> mapping the .data/.bss section.
> Looking at this I do not have a better idea for fix off the top of my head.
> So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave
> comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region.
> Should I post a patch or whether you would like to do it?

The patch at the root of this thread is basically that.

As a stopgap solution to unblock staging, I think I should drop the
adjustment of MB(1) to KB(32) and submit the patch.

This satisfies Jan's request to not make the current buffer smaller than
it currently is, and will give us more time to discuss alternative
solutions, at the cost of wasting 1MB of RAM.  (Not exactly breaking the
bank these days, but definitely something which should be fixed before
4.9 ships.)

>
>>> In regards to tboot I think that we should just take into account during
>>> calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem
>>> region to the end of BSS would help somehow here.
>> At the moment, all the ranges of Xen VA space are bounded by linker symbols.
>>
>> It is certainly possible to could account for the ebmalloc object in the
>> middle of the BSS, but it is turning into a layering violation.
> I am not sure what do you mean by "layering violation".

It feels hacky, and the wrong way to solve the problem.

~Andrew
Daniel Kiper Feb. 28, 2017, 7:21 p.m. UTC | #11
On Tue, Feb 28, 2017 at 07:09:14PM +0000, Andrew Cooper wrote:
> On 28/02/17 19:01, Daniel Kiper wrote:
> > On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote:
> >> On 28/02/17 17:41, Daniel Kiper wrote:
> >>> On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote:
> >>>> On 28/02/17 16:03, Jan Beulich wrote:
> >>>>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >>>>>> Freeing part of the BSS back for general use proves to be problematic.  It is
> >>>>>> not accounted for in xen_in_range(), causing errors when constructing the
> >>>>>> IOMMU tables, resulting in a failure to boot.
> >>>>>>
> >>>>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data,
> >>>>>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >>>>>> freeing it guarentees to shatter the hypervisor superpage mappings.
> >>>>>>
> >>>>>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >>>>>> more-reasonable 32kB and keep the entire buffer around after boot.
> >>>>> Well, that's just because right now there's only a single user. The
> >>>>> reason I refused Daniel making it smaller than its predecessor is
> >>>>> that we can't really give a good estimate of how much data may
> >>>>> need storing there: The memory map can have hundreds of entries
> >>>>> and command lines for modules may also be almost arbitrarily long.
> >>>>>
> >>>>> What I don't recall, Daniel: Why was it that we can't use EFI boot
> >>>>> services allocations here?
> >>>> From the original commit message,
> >>>>
> >>>>     1) We could use native EFI allocation functions (e.g. AllocatePool()
> >>>>        or AllocatePages()) to get memory chunk. However, later (somewhere
> >>>>        in __start_xen()) we must copy its contents to safe place or reserve
> >>>>        it in e820 memory map and map it in Xen virtual address space. This
> >>>>        means that the code referring to Xen command line, loaded modules and
> >>>>        EFI memory map, mostly in __start_xen(), will be further complicated
> >>>>        and diverge from legacy BIOS cases. Additionally, both former things
> >>>>        have to be placed below 4 GiB because their addresses are stored in
> >>>>        multiboot_info_t structure which has 32-bit relevant members.
> >>>>
> >>>>
> >>>> One way or another, if we don't want to shatter superpages, we either
> >>>> must keep the entire allocation, or copy the content out later into a
> >>>> smaller location once other heap facilities are available.
> >>> Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not?
> >> Xen is purposefully laid out like this:
> >>
> >> .text, 2M aligned, R/X
> >> .rodata, 2M aligned, R/NX
> >> .init.*, 2M aligned, R/W/X (reclaimed)
> >> .data + .bss, 2M aligned, R/W/NX
> > AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX)
> > sections. Right?
>
> When I did the original work, it was both to get proper pagetable
> permissions, and to get all of the compiled bits of Xen living in 2M
> superpages (which allows the entire hypervisor to operate in 5 TLB entries!)

Thanks for explanation.

> >> (In reality there is a syslinux bug which caused me to revert the 2M
> >> alignment in non-EFI builds, so we are still using 4k alignment, but I
> >> need to find a way to work around this and move back to enforcing the 2M
> >> alignment.)
> >>
> >> When .init gets reclaimed, this leaves a (deliberate) hole which is all
> >> 2M aligned, which has no impact on the adjacent 2M superpages.
> >>
> >> When ebmalloc gets reclaimed, it must shatter one or two superpages
> >> mapping the .data/.bss section.
> > Looking at this I do not have a better idea for fix off the top of my head.
> > So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave
> > comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region.
> > Should I post a patch or whether you would like to do it?
>
> The patch at the root of this thread is basically that.

Yep, I know.

> As a stopgap solution to unblock staging, I think I should drop the
> adjustment of MB(1) to KB(32) and submit the patch.

Sounds like a plan.

> This satisfies Jan's request to not make the current buffer smaller than
> it currently is, and will give us more time to discuss alternative
> solutions, at the cost of wasting 1MB of RAM.  (Not exactly breaking the
> bank these days, but definitely something which should be fixed before
> 4.9 ships.)

Granted!

Daniel
Jan Beulich March 1, 2017, 7:48 a.m. UTC | #12
>>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote:
> On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote:
>> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote:
>> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
>> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
>> >> > On 28/02/17 16:03, Jan Beulich wrote:
>> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>> >> >>> Freeing part of the BSS back for general use proves to be problematic.  It
>> >> > is
>> >> >>> not accounted for in xen_in_range(), causing errors when constructing the
>> >> >>> IOMMU tables, resulting in a failure to boot.
>> >> >>>
>> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
>> >> > data,
>> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
>> >> >>>
>> >> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
>> >> >>> more-reasonable 32kB and keep the entire buffer around after boot.
>> >> >> Well, that's just because right now there's only a single user. The
>> >> >> reason I refused Daniel making it smaller than its predecessor is
>> >> >> that we can't really give a good estimate of how much data may
>> >> >> need storing there: The memory map can have hundreds of entries
>> >> >> and command lines for modules may also be almost arbitrarily long.
>> >> >>
>> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
>> >> >> services allocations here?
>> >> >
>> >> > From the original commit message,
>> >> >
>> >> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >> >        or AllocatePages()) to get memory chunk. However, later (somewhere
>> >> >        in __start_xen()) we must copy its contents to safe place or reserve
>> >> >        it in e820 memory map and map it in Xen virtual address space.
>> >>
>> >> Reading this again, I have to admit that I don't understand why any
>> >> copying or reserving would need to be done. We'd need to do runtime
>> >> allocations, sure, but I would have thought this goes without saying.
>> >
>> > We discussed this once but I do not remember the thread. In general Xen EFI
>> > boot code should allocate memory as EfiLoaderData. However, later in
>> > efi_arch_process_memory_map() we treat this memory type as free. This means
>> > that it can be overwritten. So, that is why I mentioned copy or reservation.
>>
>> There's nothing disallowing runtime data allocations, afaik.
> 
> Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO.
> AIUI, this is not intended to use in that way.

And that you base on what statement(s) in the spec?

Jan
Jan Beulich March 1, 2017, 8:02 a.m. UTC | #13
>>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote:
> On 28/02/17 19:01, Daniel Kiper wrote:
>> Looking at this I do not have a better idea for fix off the top of my head.

Well, one better solution was already suggested: Put it in .init.data and
suppress the freeing of the used part of it. The other option is to put
a section along the lines of Linux'es .brk _after_ .bss (whether ahead
of or after _end would need to be seen), replacing the uses of
__2M_rwdata_end by a runtime calculated value (and freeing space
past the used portion accordingly, without shattering large pages if
used).

>> So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave
>> comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region.
>> Should I post a patch or whether you would like to do it?
> 
> The patch at the root of this thread is basically that.
> 
> As a stopgap solution to unblock staging, I think I should drop the
> adjustment of MB(1) to KB(32) and submit the patch.
> 
> This satisfies Jan's request to not make the current buffer smaller than
> it currently is, and will give us more time to discuss alternative
> solutions, at the cost of wasting 1MB of RAM.  (Not exactly breaking the
> bank these days, but definitely something which should be fixed before
> 4.9 ships.)

Yes, as an immediate band aid I can live with this.

Jan
Daniel Kiper March 1, 2017, 8:20 a.m. UTC | #14
On Wed, Mar 01, 2017 at 12:48:26AM -0700, Jan Beulich wrote:
> >>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote:
> > On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote:
> >> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote:
> >> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
> >> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
> >> >> > On 28/02/17 16:03, Jan Beulich wrote:
> >> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
> >> >> >>> Freeing part of the BSS back for general use proves to be problematic.  It
> >> >> > is
> >> >> >>> not accounted for in xen_in_range(), causing errors when constructing the
> >> >> >>> IOMMU tables, resulting in a failure to boot.
> >> >> >>>
> >> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
> >> >> > data,
> >> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
> >> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
> >> >> >>>
> >> >> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to a
> >> >> >>> more-reasonable 32kB and keep the entire buffer around after boot.
> >> >> >> Well, that's just because right now there's only a single user. The
> >> >> >> reason I refused Daniel making it smaller than its predecessor is
> >> >> >> that we can't really give a good estimate of how much data may
> >> >> >> need storing there: The memory map can have hundreds of entries
> >> >> >> and command lines for modules may also be almost arbitrarily long.
> >> >> >>
> >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
> >> >> >> services allocations here?
> >> >> >
> >> >> > From the original commit message,
> >> >> >
> >> >> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
> >> >> >        or AllocatePages()) to get memory chunk. However, later (somewhere
> >> >> >        in __start_xen()) we must copy its contents to safe place or reserve
> >> >> >        it in e820 memory map and map it in Xen virtual address space.
> >> >>
> >> >> Reading this again, I have to admit that I don't understand why any
> >> >> copying or reserving would need to be done. We'd need to do runtime
> >> >> allocations, sure, but I would have thought this goes without saying.
> >> >
> >> > We discussed this once but I do not remember the thread. In general Xen EFI
> >> > boot code should allocate memory as EfiLoaderData. However, later in
> >> > efi_arch_process_memory_map() we treat this memory type as free. This means
> >> > that it can be overwritten. So, that is why I mentioned copy or reservation.
> >>
> >> There's nothing disallowing runtime data allocations, afaik.
> >
> > Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO.
> > AIUI, this is not intended to use in that way.
>
> And that you base on what statement(s) in the spec?

UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should
allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must
allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers
should allocate memory (and pool) of type EfiRuntimeServicesData (although such
allocation can only be made during boot services time).

UEFI ver. 2.6, p. 210, Default pool allocations from memory type row.

Xen is UEFI application, so, we should use EfiLoaderData. Of course we potentially
can use EfiRuntimeServicesData because it is not strictly forbidden. However, as you
can see above EfiRuntimeServicesData is targeted for UEFI runtime drivers and stuff
like that. So, that is why I said "not nice" in my earlier email.

Daniel
Jan Beulich March 1, 2017, 8:47 a.m. UTC | #15
>>> On 01.03.17 at 09:20, <daniel.kiper@oracle.com> wrote:
> On Wed, Mar 01, 2017 at 12:48:26AM -0700, Jan Beulich wrote:
>> >>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote:
>> > On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote:
>> >> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote:
>> >> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote:
>> >> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote:
>> >> >> > On 28/02/17 16:03, Jan Beulich wrote:
>> >> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote:
>> >> >> >>> Freeing part of the BSS back for general use proves to be problematic.  
> It
>> >> >> > is
>> >> >> >>> not accounted for in xen_in_range(), causing errors when constructing the
>> >> >> >>> IOMMU tables, resulting in a failure to boot.
>> >> >> >>>
>> >> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor
>> >> >> > data,
>> >> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size,
>> >> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings.
>> >> >> >>>
>> >> >> >>> Judging by the content stored in it, 1MB is overkill on size.  Drop it to 
> a
>> >> >> >>> more-reasonable 32kB and keep the entire buffer around after boot.
>> >> >> >> Well, that's just because right now there's only a single user. The
>> >> >> >> reason I refused Daniel making it smaller than its predecessor is
>> >> >> >> that we can't really give a good estimate of how much data may
>> >> >> >> need storing there: The memory map can have hundreds of entries
>> >> >> >> and command lines for modules may also be almost arbitrarily long.
>> >> >> >>
>> >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot
>> >> >> >> services allocations here?
>> >> >> >
>> >> >> > From the original commit message,
>> >> >> >
>> >> >> >     1) We could use native EFI allocation functions (e.g. AllocatePool()
>> >> >> >        or AllocatePages()) to get memory chunk. However, later (somewhere
>> >> >> >        in __start_xen()) we must copy its contents to safe place or 
> reserve
>> >> >> >        it in e820 memory map and map it in Xen virtual address space.
>> >> >>
>> >> >> Reading this again, I have to admit that I don't understand why any
>> >> >> copying or reserving would need to be done. We'd need to do runtime
>> >> >> allocations, sure, but I would have thought this goes without saying.
>> >> >
>> >> > We discussed this once but I do not remember the thread. In general Xen 
> EFI
>> >> > boot code should allocate memory as EfiLoaderData. However, later in
>> >> > efi_arch_process_memory_map() we treat this memory type as free. This 
> means
>> >> > that it can be overwritten. So, that is why I mentioned copy or 
> reservation.
>> >>
>> >> There's nothing disallowing runtime data allocations, afaik.
>> >
>> > Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO.
>> > AIUI, this is not intended to use in that way.
>>
>> And that you base on what statement(s) in the spec?
> 
> UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should
> allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must
> allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers
> should allocate memory (and pool) of type EfiRuntimeServicesData (although such
> allocation can only be made during boot services time).

Note the "In general ... should ...", as opposed to the "must" later on.
The situation here simply falls outside of the general case, and there's
a reason the spec isn't strict here. ARM, btw., already has such a use
in efi_arch_handle_cmdline().

Jan
Daniel Kiper March 1, 2017, 9 a.m. UTC | #16
On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote:
> >>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote:
> > On 28/02/17 19:01, Daniel Kiper wrote:
> >> Looking at this I do not have a better idea for fix off the top of my head.
>
> Well, one better solution was already suggested: Put it in .init.data and

This will increase Xen binary size. So, I prefer to leave it in BSS if possible.

> suppress the freeing of the used part of it. The other option is to put
> a section along the lines of Linux'es .brk _after_ .bss (whether ahead
> of or after _end would need to be seen), replacing the uses of
> __2M_rwdata_end by a runtime calculated value (and freeing space
> past the used portion accordingly, without shattering large pages if
> used).

OK, make sense for me. However, from a bootloader point of view it should be
still part of BSS. Otherwise, the bootloader can put the end of Xen image too
close to reserved/no ram/etc. region. This does not seem important right now
but after applying the rest of my patch series it can make a pain.

Daniel
Jan Beulich March 1, 2017, 9:08 a.m. UTC | #17
>>> On 01.03.17 at 10:00, <daniel.kiper@oracle.com> wrote:
> On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote:
>> >>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote:
>> > On 28/02/17 19:01, Daniel Kiper wrote:
>> >> Looking at this I do not have a better idea for fix off the top of my head.
>>
>> Well, one better solution was already suggested: Put it in .init.data and
> 
> This will increase Xen binary size. So, I prefer to leave it in BSS if 
> possible.

I don't think that's very important a criteria.

>> suppress the freeing of the used part of it. The other option is to put
>> a section along the lines of Linux'es .brk _after_ .bss (whether ahead
>> of or after _end would need to be seen), replacing the uses of
>> __2M_rwdata_end by a runtime calculated value (and freeing space
>> past the used portion accordingly, without shattering large pages if
>> used).
> 
> OK, make sense for me. However, from a bootloader point of view it should be
> still part of BSS. Otherwise, the bootloader can put the end of Xen image too
> close to reserved/no ram/etc. region. This does not seem important right now
> but after applying the rest of my patch series it can make a pain.

I'm afraid I don't understand: It needs to be part of the ELF
load segments or PE32+ loaded sections, but whether that's
part of .bss or an adjacent @nobits section at the source level
shouldn't matter at all.

Jan
Daniel Kiper March 1, 2017, 9:21 a.m. UTC | #18
On Wed, Mar 01, 2017 at 01:47:21AM -0700, Jan Beulich wrote:
> >>> On 01.03.17 at 09:20, <daniel.kiper@oracle.com> wrote:

[...]

> > UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should
> > allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must
> > allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers
> > should allocate memory (and pool) of type EfiRuntimeServicesData (although such
> > allocation can only be made during boot services time).
>
> Note the "In general ... should ...", as opposed to the "must" later on.
> The situation here simply falls outside of the general case, and there's
> a reason the spec isn't strict here. ARM, btw., already has such a use
> in efi_arch_handle_cmdline().

Yes, I see that. And that is why I say "not nice" not fundamentally wrong.

Daniel
Daniel Kiper March 1, 2017, 9:44 a.m. UTC | #19
On Wed, Mar 01, 2017 at 02:08:17AM -0700, Jan Beulich wrote:
> >>> On 01.03.17 at 10:00, <daniel.kiper@oracle.com> wrote:
> > On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote:

[...]

> >> suppress the freeing of the used part of it. The other option is to put
> >> a section along the lines of Linux'es .brk _after_ .bss (whether ahead
> >> of or after _end would need to be seen), replacing the uses of
> >> __2M_rwdata_end by a runtime calculated value (and freeing space
> >> past the used portion accordingly, without shattering large pages if
> >> used).
> >
> > OK, make sense for me. However, from a bootloader point of view it should be
> > still part of BSS. Otherwise, the bootloader can put the end of Xen image too
> > close to reserved/no ram/etc. region. This does not seem important right now
> > but after applying the rest of my patch series it can make a pain.
>
> I'm afraid I don't understand: It needs to be part of the ELF
> load segments or PE32+ loaded sections, but whether that's
> part of .bss or an adjacent @nobits section at the source level
> shouldn't matter at all.

At source level I do not care much. And if it is a part "of the ELF load
segments" and "PE32+ loaded sections" then it is OK for me.

Daniel
diff mbox

Patch

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index b6cbdad..82bf15b 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -120,11 +120,10 @@  static CHAR16 __initdata newline[] = L"\r\n";
  *     have to be disabled in xen/arch/arm/arm64/head.S; though BSS
  *     should be initialized somehow before use of variables living there,
  *   - use ebmalloc() in ARM/common EFI boot code,
- *   - call free_ebmalloc_unused_mem() somewhere in init code.
  */
 #define EBMALLOC_SIZE	MB(0)
 #else
-#define EBMALLOC_SIZE	MB(1)
+#define EBMALLOC_SIZE	KB(32)
 #endif
 
 static char __section(".bss.page_aligned") __aligned(PAGE_SIZE)
@@ -144,19 +143,6 @@  static void __init __maybe_unused *ebmalloc(size_t size)
     return ptr;
 }
 
-static void __init __maybe_unused free_ebmalloc_unused_mem(void)
-{
-    unsigned long start, end;
-
-    start = (unsigned long)ebmalloc_mem + PAGE_ALIGN(ebmalloc_allocated);
-    end = (unsigned long)ebmalloc_mem + sizeof(ebmalloc_mem);
-
-    destroy_xen_mappings(start, end);
-    init_xenheap_pages(__pa(start), __pa(end));
-
-    printk(XENLOG_INFO "Freed %lukB unused BSS memory\n", (end - start) >> 10);
-}
-
 /*
  * Include architecture specific implementation here, which references the
  * static globals defined above.
@@ -1310,8 +1296,6 @@  void __init efi_init_memory(void)
     } *extra, *extra_head = NULL;
 #endif
 
-    free_ebmalloc_unused_mem();
-
     if ( !efi_enabled(EFI_BOOT) )
         return;