Message ID | 1488295218-26057-2-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > Freeing part of the BSS back for general use proves to be problematic. It is > not accounted for in xen_in_range(), causing errors when constructing the > IOMMU tables, resulting in a failure to boot. > > Other smaller issues are that tboot treats the entire BSS as hypervisor data, > creating and checking a MAC of it on S3, and that, by being 1MB in size, > freeing it guarentees to shatter the hypervisor superpage mappings. > > Judging by the content stored in it, 1MB is overkill on size. Drop it to a > more-reasonable 32kB and keep the entire buffer around after boot. Well, that's just because right now there's only a single user. The reason I refused Daniel making it smaller than its predecessor is that we can't really give a good estimate of how much data may need storing there: The memory map can have hundreds of entries and command lines for modules may also be almost arbitrarily long. What I don't recall, Daniel: Why was it that we can't use EFI boot services allocations here? Jan
On 28/02/17 16:03, Jan Beulich wrote: >>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >> Freeing part of the BSS back for general use proves to be problematic. It is >> not accounted for in xen_in_range(), causing errors when constructing the >> IOMMU tables, resulting in a failure to boot. >> >> Other smaller issues are that tboot treats the entire BSS as hypervisor data, >> creating and checking a MAC of it on S3, and that, by being 1MB in size, >> freeing it guarentees to shatter the hypervisor superpage mappings. >> >> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >> more-reasonable 32kB and keep the entire buffer around after boot. > Well, that's just because right now there's only a single user. The > reason I refused Daniel making it smaller than its predecessor is > that we can't really give a good estimate of how much data may > need storing there: The memory map can have hundreds of entries > and command lines for modules may also be almost arbitrarily long. > > What I don't recall, Daniel: Why was it that we can't use EFI boot > services allocations here? From the original commit message, 1) We could use native EFI allocation functions (e.g. AllocatePool() or AllocatePages()) to get memory chunk. However, later (somewhere in __start_xen()) we must copy its contents to safe place or reserve it in e820 memory map and map it in Xen virtual address space. This means that the code referring to Xen command line, loaded modules and EFI memory map, mostly in __start_xen(), will be further complicated and diverge from legacy BIOS cases. Additionally, both former things have to be placed below 4 GiB because their addresses are stored in multiboot_info_t structure which has 32-bit relevant members. One way or another, if we don't want to shatter superpages, we either must keep the entire allocation, or copy the content out later into a smaller location once other heap facilities are available. If we are copying data out, we might as well use EFI heap facilities rather than rolling our own. ~Andrew
>>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: > On 28/02/17 16:03, Jan Beulich wrote: >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >>> Freeing part of the BSS back for general use proves to be problematic. It > is >>> not accounted for in xen_in_range(), causing errors when constructing the >>> IOMMU tables, resulting in a failure to boot. >>> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor > data, >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >>> freeing it guarentees to shatter the hypervisor superpage mappings. >>> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >>> more-reasonable 32kB and keep the entire buffer around after boot. >> Well, that's just because right now there's only a single user. The >> reason I refused Daniel making it smaller than its predecessor is >> that we can't really give a good estimate of how much data may >> need storing there: The memory map can have hundreds of entries >> and command lines for modules may also be almost arbitrarily long. >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot >> services allocations here? > > From the original commit message, > > 1) We could use native EFI allocation functions (e.g. AllocatePool() > or AllocatePages()) to get memory chunk. However, later (somewhere > in __start_xen()) we must copy its contents to safe place or reserve > it in e820 memory map and map it in Xen virtual address space. Reading this again, I have to admit that I don't understand why any copying or reserving would need to be done. We'd need to do runtime allocations, sure, but I would have thought this goes without saying. > This > means that the code referring to Xen command line, loaded modules and > EFI memory map, mostly in __start_xen(), will be further complicated > and diverge from legacy BIOS cases. Additionally, both former things > have to be placed below 4 GiB because their addresses are stored in > multiboot_info_t structure which has 32-bit relevant members. > > > One way or another, if we don't want to shatter superpages, we either > must keep the entire allocation, or copy the content out later into a > smaller location once other heap facilities are available. > > If we are copying data out, we might as well use EFI heap facilities > rather than rolling our own. Well, copying data later won't work, as there are pointers stored to the original allocation. Jan
On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: > >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: > > On 28/02/17 16:03, Jan Beulich wrote: > >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >>> Freeing part of the BSS back for general use proves to be problematic. It > > is > >>> not accounted for in xen_in_range(), causing errors when constructing the > >>> IOMMU tables, resulting in a failure to boot. > >>> > >>> Other smaller issues are that tboot treats the entire BSS as hypervisor > > data, > >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >>> freeing it guarentees to shatter the hypervisor superpage mappings. > >>> > >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >>> more-reasonable 32kB and keep the entire buffer around after boot. > >> Well, that's just because right now there's only a single user. The > >> reason I refused Daniel making it smaller than its predecessor is > >> that we can't really give a good estimate of how much data may > >> need storing there: The memory map can have hundreds of entries > >> and command lines for modules may also be almost arbitrarily long. > >> > >> What I don't recall, Daniel: Why was it that we can't use EFI boot > >> services allocations here? > > > > From the original commit message, > > > > 1) We could use native EFI allocation functions (e.g. AllocatePool() > > or AllocatePages()) to get memory chunk. However, later (somewhere > > in __start_xen()) we must copy its contents to safe place or reserve > > it in e820 memory map and map it in Xen virtual address space. > > Reading this again, I have to admit that I don't understand why any > copying or reserving would need to be done. We'd need to do runtime > allocations, sure, but I would have thought this goes without saying. We discussed this once but I do not remember the thread. In general Xen EFI boot code should allocate memory as EfiLoaderData. However, later in efi_arch_process_memory_map() we treat this memory type as free. This means that it can be overwritten. So, that is why I mentioned copy or reservation. > > This > > means that the code referring to Xen command line, loaded modules and > > EFI memory map, mostly in __start_xen(), will be further complicated > > and diverge from legacy BIOS cases. Additionally, both former things > > have to be placed below 4 GiB because their addresses are stored in > > multiboot_info_t structure which has 32-bit relevant members. > > > > > > One way or another, if we don't want to shatter superpages, we either > > must keep the entire allocation, or copy the content out later into a > > smaller location once other heap facilities are available. > > > > If we are copying data out, we might as well use EFI heap facilities > > rather than rolling our own. > > Well, copying data later won't work, as there are pointers stored to > the original allocation. This is not impossible. Just requires fiddling with mbi struct. Daniel
>>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote: > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: >> > On 28/02/17 16:03, Jan Beulich wrote: >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >> >>> Freeing part of the BSS back for general use proves to be problematic. It >> > is >> >>> not accounted for in xen_in_range(), causing errors when constructing the >> >>> IOMMU tables, resulting in a failure to boot. >> >>> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor >> > data, >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >> >>> freeing it guarentees to shatter the hypervisor superpage mappings. >> >>> >> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >> >>> more-reasonable 32kB and keep the entire buffer around after boot. >> >> Well, that's just because right now there's only a single user. The >> >> reason I refused Daniel making it smaller than its predecessor is >> >> that we can't really give a good estimate of how much data may >> >> need storing there: The memory map can have hundreds of entries >> >> and command lines for modules may also be almost arbitrarily long. >> >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot >> >> services allocations here? >> > >> > From the original commit message, >> > >> > 1) We could use native EFI allocation functions (e.g. AllocatePool() >> > or AllocatePages()) to get memory chunk. However, later (somewhere >> > in __start_xen()) we must copy its contents to safe place or reserve >> > it in e820 memory map and map it in Xen virtual address space. >> >> Reading this again, I have to admit that I don't understand why any >> copying or reserving would need to be done. We'd need to do runtime >> allocations, sure, but I would have thought this goes without saying. > > We discussed this once but I do not remember the thread. In general Xen EFI > boot code should allocate memory as EfiLoaderData. However, later in > efi_arch_process_memory_map() we treat this memory type as free. This means > that it can be overwritten. So, that is why I mentioned copy or reservation. There's nothing disallowing runtime data allocations, afaik. >> > This >> > means that the code referring to Xen command line, loaded modules and >> > EFI memory map, mostly in __start_xen(), will be further complicated >> > and diverge from legacy BIOS cases. Additionally, both former things >> > have to be placed below 4 GiB because their addresses are stored in >> > multiboot_info_t structure which has 32-bit relevant members. >> > >> > >> > One way or another, if we don't want to shatter superpages, we either >> > must keep the entire allocation, or copy the content out later into a >> > smaller location once other heap facilities are available. >> > >> > If we are copying data out, we might as well use EFI heap facilities >> > rather than rolling our own. >> >> Well, copying data later won't work, as there are pointers stored to >> the original allocation. > > This is not impossible. Just requires fiddling with mbi struct. That would cover this one specific use only, and even then how do you know there weren't any derived pointers stored elsewhere? Jan
On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote: > On 28/02/17 16:03, Jan Beulich wrote: > >>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >> Freeing part of the BSS back for general use proves to be problematic. It is > >> not accounted for in xen_in_range(), causing errors when constructing the > >> IOMMU tables, resulting in a failure to boot. > >> > >> Other smaller issues are that tboot treats the entire BSS as hypervisor data, > >> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >> freeing it guarentees to shatter the hypervisor superpage mappings. > >> > >> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >> more-reasonable 32kB and keep the entire buffer around after boot. > > Well, that's just because right now there's only a single user. The > > reason I refused Daniel making it smaller than its predecessor is > > that we can't really give a good estimate of how much data may > > need storing there: The memory map can have hundreds of entries > > and command lines for modules may also be almost arbitrarily long. > > > > What I don't recall, Daniel: Why was it that we can't use EFI boot > > services allocations here? > > From the original commit message, > > 1) We could use native EFI allocation functions (e.g. AllocatePool() > or AllocatePages()) to get memory chunk. However, later (somewhere > in __start_xen()) we must copy its contents to safe place or reserve > it in e820 memory map and map it in Xen virtual address space. This > means that the code referring to Xen command line, loaded modules and > EFI memory map, mostly in __start_xen(), will be further complicated > and diverge from legacy BIOS cases. Additionally, both former things > have to be placed below 4 GiB because their addresses are stored in > multiboot_info_t structure which has 32-bit relevant members. > > > One way or another, if we don't want to shatter superpages, we either > must keep the entire allocation, or copy the content out later into a > smaller location once other heap facilities are available. Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not? In regards to tboot I think that we should just take into account during calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem region to the end of BSS would help somehow here. Daniel
On 28/02/17 17:41, Daniel Kiper wrote: > On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote: >> On 28/02/17 16:03, Jan Beulich wrote: >>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >>>> Freeing part of the BSS back for general use proves to be problematic. It is >>>> not accounted for in xen_in_range(), causing errors when constructing the >>>> IOMMU tables, resulting in a failure to boot. >>>> >>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data, >>>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >>>> freeing it guarentees to shatter the hypervisor superpage mappings. >>>> >>>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >>>> more-reasonable 32kB and keep the entire buffer around after boot. >>> Well, that's just because right now there's only a single user. The >>> reason I refused Daniel making it smaller than its predecessor is >>> that we can't really give a good estimate of how much data may >>> need storing there: The memory map can have hundreds of entries >>> and command lines for modules may also be almost arbitrarily long. >>> >>> What I don't recall, Daniel: Why was it that we can't use EFI boot >>> services allocations here? >> From the original commit message, >> >> 1) We could use native EFI allocation functions (e.g. AllocatePool() >> or AllocatePages()) to get memory chunk. However, later (somewhere >> in __start_xen()) we must copy its contents to safe place or reserve >> it in e820 memory map and map it in Xen virtual address space. This >> means that the code referring to Xen command line, loaded modules and >> EFI memory map, mostly in __start_xen(), will be further complicated >> and diverge from legacy BIOS cases. Additionally, both former things >> have to be placed below 4 GiB because their addresses are stored in >> multiboot_info_t structure which has 32-bit relevant members. >> >> >> One way or another, if we don't want to shatter superpages, we either >> must keep the entire allocation, or copy the content out later into a >> smaller location once other heap facilities are available. > Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not? Xen is purposefully laid out like this: .text, 2M aligned, R/X .rodata, 2M aligned, R/NX .init.*, 2M aligned, R/W/X (reclaimed) .data + .bss, 2M aligned, R/W/NX (In reality there is a syslinux bug which caused me to revert the 2M alignment in non-EFI builds, so we are still using 4k alignment, but I need to find a way to work around this and move back to enforcing the 2M alignment.) When .init gets reclaimed, this leaves a (deliberate) hole which is all 2M aligned, which has no impact on the adjacent 2M superpages. When ebmalloc gets reclaimed, it must shatter one or two superpages mapping the .data/.bss section. > In regards to tboot I think that we should just take into account during > calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem > region to the end of BSS would help somehow here. At the moment, all the ranges of Xen VA space are bounded by linker symbols. It is certainly possible to could account for the ebmalloc object in the middle of the BSS, but it is turning into a layering violation. ~Andrew
On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote: > >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote: > > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: > >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: > >> > On 28/02/17 16:03, Jan Beulich wrote: > >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >> >>> Freeing part of the BSS back for general use proves to be problematic. It > >> > is > >> >>> not accounted for in xen_in_range(), causing errors when constructing the > >> >>> IOMMU tables, resulting in a failure to boot. > >> >>> > >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor > >> > data, > >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >> >>> freeing it guarentees to shatter the hypervisor superpage mappings. > >> >>> > >> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >> >>> more-reasonable 32kB and keep the entire buffer around after boot. > >> >> Well, that's just because right now there's only a single user. The > >> >> reason I refused Daniel making it smaller than its predecessor is > >> >> that we can't really give a good estimate of how much data may > >> >> need storing there: The memory map can have hundreds of entries > >> >> and command lines for modules may also be almost arbitrarily long. > >> >> > >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot > >> >> services allocations here? > >> > > >> > From the original commit message, > >> > > >> > 1) We could use native EFI allocation functions (e.g. AllocatePool() > >> > or AllocatePages()) to get memory chunk. However, later (somewhere > >> > in __start_xen()) we must copy its contents to safe place or reserve > >> > it in e820 memory map and map it in Xen virtual address space. > >> > >> Reading this again, I have to admit that I don't understand why any > >> copying or reserving would need to be done. We'd need to do runtime > >> allocations, sure, but I would have thought this goes without saying. > > > > We discussed this once but I do not remember the thread. In general Xen EFI > > boot code should allocate memory as EfiLoaderData. However, later in > > efi_arch_process_memory_map() we treat this memory type as free. This means > > that it can be overwritten. So, that is why I mentioned copy or reservation. > > There's nothing disallowing runtime data allocations, afaik. Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO. AIUI, this is not intended to use in that way. > >> > This > >> > means that the code referring to Xen command line, loaded modules and > >> > EFI memory map, mostly in __start_xen(), will be further complicated > >> > and diverge from legacy BIOS cases. Additionally, both former things > >> > have to be placed below 4 GiB because their addresses are stored in > >> > multiboot_info_t structure which has 32-bit relevant members. > >> > > >> > > >> > One way or another, if we don't want to shatter superpages, we either > >> > must keep the entire allocation, or copy the content out later into a > >> > smaller location once other heap facilities are available. > >> > > >> > If we are copying data out, we might as well use EFI heap facilities > >> > rather than rolling our own. > >> > >> Well, copying data later won't work, as there are pointers stored to > >> the original allocation. > > > > This is not impossible. Just requires fiddling with mbi struct. > > That would cover this one specific use only, and even then how > do you know there weren't any derived pointers stored elsewhere? Right, we should avoid it. It is too fragile approach. Daniel
On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote: > On 28/02/17 17:41, Daniel Kiper wrote: > > On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote: > >> On 28/02/17 16:03, Jan Beulich wrote: > >>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >>>> Freeing part of the BSS back for general use proves to be problematic. It is > >>>> not accounted for in xen_in_range(), causing errors when constructing the > >>>> IOMMU tables, resulting in a failure to boot. > >>>> > >>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data, > >>>> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >>>> freeing it guarentees to shatter the hypervisor superpage mappings. > >>>> > >>>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >>>> more-reasonable 32kB and keep the entire buffer around after boot. > >>> Well, that's just because right now there's only a single user. The > >>> reason I refused Daniel making it smaller than its predecessor is > >>> that we can't really give a good estimate of how much data may > >>> need storing there: The memory map can have hundreds of entries > >>> and command lines for modules may also be almost arbitrarily long. > >>> > >>> What I don't recall, Daniel: Why was it that we can't use EFI boot > >>> services allocations here? > >> From the original commit message, > >> > >> 1) We could use native EFI allocation functions (e.g. AllocatePool() > >> or AllocatePages()) to get memory chunk. However, later (somewhere > >> in __start_xen()) we must copy its contents to safe place or reserve > >> it in e820 memory map and map it in Xen virtual address space. This > >> means that the code referring to Xen command line, loaded modules and > >> EFI memory map, mostly in __start_xen(), will be further complicated > >> and diverge from legacy BIOS cases. Additionally, both former things > >> have to be placed below 4 GiB because their addresses are stored in > >> multiboot_info_t structure which has 32-bit relevant members. > >> > >> > >> One way or another, if we don't want to shatter superpages, we either > >> must keep the entire allocation, or copy the content out later into a > >> smaller location once other heap facilities are available. > > Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not? > > Xen is purposefully laid out like this: > > .text, 2M aligned, R/X > .rodata, 2M aligned, R/NX > .init.*, 2M aligned, R/W/X (reclaimed) > .data + .bss, 2M aligned, R/W/NX AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX) sections. Right? > (In reality there is a syslinux bug which caused me to revert the 2M > alignment in non-EFI builds, so we are still using 4k alignment, but I > need to find a way to work around this and move back to enforcing the 2M > alignment.) > > When .init gets reclaimed, this leaves a (deliberate) hole which is all > 2M aligned, which has no impact on the adjacent 2M superpages. > > When ebmalloc gets reclaimed, it must shatter one or two superpages > mapping the .data/.bss section. Looking at this I do not have a better idea for fix off the top of my head. So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region. Should I post a patch or whether you would like to do it? > > In regards to tboot I think that we should just take into account during > > calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem > > region to the end of BSS would help somehow here. > > At the moment, all the ranges of Xen VA space are bounded by linker symbols. > > It is certainly possible to could account for the ebmalloc object in the > middle of the BSS, but it is turning into a layering violation. I am not sure what do you mean by "layering violation". Daniel
On 28/02/17 19:01, Daniel Kiper wrote: > On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote: >> On 28/02/17 17:41, Daniel Kiper wrote: >>> On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote: >>>> On 28/02/17 16:03, Jan Beulich wrote: >>>>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >>>>>> Freeing part of the BSS back for general use proves to be problematic. It is >>>>>> not accounted for in xen_in_range(), causing errors when constructing the >>>>>> IOMMU tables, resulting in a failure to boot. >>>>>> >>>>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data, >>>>>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >>>>>> freeing it guarentees to shatter the hypervisor superpage mappings. >>>>>> >>>>>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >>>>>> more-reasonable 32kB and keep the entire buffer around after boot. >>>>> Well, that's just because right now there's only a single user. The >>>>> reason I refused Daniel making it smaller than its predecessor is >>>>> that we can't really give a good estimate of how much data may >>>>> need storing there: The memory map can have hundreds of entries >>>>> and command lines for modules may also be almost arbitrarily long. >>>>> >>>>> What I don't recall, Daniel: Why was it that we can't use EFI boot >>>>> services allocations here? >>>> From the original commit message, >>>> >>>> 1) We could use native EFI allocation functions (e.g. AllocatePool() >>>> or AllocatePages()) to get memory chunk. However, later (somewhere >>>> in __start_xen()) we must copy its contents to safe place or reserve >>>> it in e820 memory map and map it in Xen virtual address space. This >>>> means that the code referring to Xen command line, loaded modules and >>>> EFI memory map, mostly in __start_xen(), will be further complicated >>>> and diverge from legacy BIOS cases. Additionally, both former things >>>> have to be placed below 4 GiB because their addresses are stored in >>>> multiboot_info_t structure which has 32-bit relevant members. >>>> >>>> >>>> One way or another, if we don't want to shatter superpages, we either >>>> must keep the entire allocation, or copy the content out later into a >>>> smaller location once other heap facilities are available. >>> Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not? >> Xen is purposefully laid out like this: >> >> .text, 2M aligned, R/X >> .rodata, 2M aligned, R/NX >> .init.*, 2M aligned, R/W/X (reclaimed) >> .data + .bss, 2M aligned, R/W/NX > AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX) > sections. Right? When I did the original work, it was both to get proper pagetable permissions, and to get all of the compiled bits of Xen living in 2M superpages (which allows the entire hypervisor to operate in 5 TLB entries!) > >> (In reality there is a syslinux bug which caused me to revert the 2M >> alignment in non-EFI builds, so we are still using 4k alignment, but I >> need to find a way to work around this and move back to enforcing the 2M >> alignment.) >> >> When .init gets reclaimed, this leaves a (deliberate) hole which is all >> 2M aligned, which has no impact on the adjacent 2M superpages. >> >> When ebmalloc gets reclaimed, it must shatter one or two superpages >> mapping the .data/.bss section. > Looking at this I do not have a better idea for fix off the top of my head. > So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave > comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region. > Should I post a patch or whether you would like to do it? The patch at the root of this thread is basically that. As a stopgap solution to unblock staging, I think I should drop the adjustment of MB(1) to KB(32) and submit the patch. This satisfies Jan's request to not make the current buffer smaller than it currently is, and will give us more time to discuss alternative solutions, at the cost of wasting 1MB of RAM. (Not exactly breaking the bank these days, but definitely something which should be fixed before 4.9 ships.) > >>> In regards to tboot I think that we should just take into account during >>> calculation ebmalloc_mem allocated part only. Maybe move of ebmalloc_mem >>> region to the end of BSS would help somehow here. >> At the moment, all the ranges of Xen VA space are bounded by linker symbols. >> >> It is certainly possible to could account for the ebmalloc object in the >> middle of the BSS, but it is turning into a layering violation. > I am not sure what do you mean by "layering violation". It feels hacky, and the wrong way to solve the problem. ~Andrew
On Tue, Feb 28, 2017 at 07:09:14PM +0000, Andrew Cooper wrote: > On 28/02/17 19:01, Daniel Kiper wrote: > > On Tue, Feb 28, 2017 at 05:58:26PM +0000, Andrew Cooper wrote: > >> On 28/02/17 17:41, Daniel Kiper wrote: > >>> On Tue, Feb 28, 2017 at 04:08:35PM +0000, Andrew Cooper wrote: > >>>> On 28/02/17 16:03, Jan Beulich wrote: > >>>>>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >>>>>> Freeing part of the BSS back for general use proves to be problematic. It is > >>>>>> not accounted for in xen_in_range(), causing errors when constructing the > >>>>>> IOMMU tables, resulting in a failure to boot. > >>>>>> > >>>>>> Other smaller issues are that tboot treats the entire BSS as hypervisor data, > >>>>>> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >>>>>> freeing it guarentees to shatter the hypervisor superpage mappings. > >>>>>> > >>>>>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >>>>>> more-reasonable 32kB and keep the entire buffer around after boot. > >>>>> Well, that's just because right now there's only a single user. The > >>>>> reason I refused Daniel making it smaller than its predecessor is > >>>>> that we can't really give a good estimate of how much data may > >>>>> need storing there: The memory map can have hundreds of entries > >>>>> and command lines for modules may also be almost arbitrarily long. > >>>>> > >>>>> What I don't recall, Daniel: Why was it that we can't use EFI boot > >>>>> services allocations here? > >>>> From the original commit message, > >>>> > >>>> 1) We could use native EFI allocation functions (e.g. AllocatePool() > >>>> or AllocatePages()) to get memory chunk. However, later (somewhere > >>>> in __start_xen()) we must copy its contents to safe place or reserve > >>>> it in e820 memory map and map it in Xen virtual address space. This > >>>> means that the code referring to Xen command line, loaded modules and > >>>> EFI memory map, mostly in __start_xen(), will be further complicated > >>>> and diverge from legacy BIOS cases. Additionally, both former things > >>>> have to be placed below 4 GiB because their addresses are stored in > >>>> multiboot_info_t structure which has 32-bit relevant members. > >>>> > >>>> > >>>> One way or another, if we don't want to shatter superpages, we either > >>>> must keep the entire allocation, or copy the content out later into a > >>>> smaller location once other heap facilities are available. > >>> Hmmm... Why BSS free "shatter superpages" and .init.* sections free does not? > >> Xen is purposefully laid out like this: > >> > >> .text, 2M aligned, R/X > >> .rodata, 2M aligned, R/NX > >> .init.*, 2M aligned, R/W/X (reclaimed) > >> .data + .bss, 2M aligned, R/W/NX > > AIUI, the purpose is to have properly mapped (in terms of R/W/X/NX) > > sections. Right? > > When I did the original work, it was both to get proper pagetable > permissions, and to get all of the compiled bits of Xen living in 2M > superpages (which allows the entire hypervisor to operate in 5 TLB entries!) Thanks for explanation. > >> (In reality there is a syslinux bug which caused me to revert the 2M > >> alignment in non-EFI builds, so we are still using 4k alignment, but I > >> need to find a way to work around this and move back to enforcing the 2M > >> alignment.) > >> > >> When .init gets reclaimed, this leaves a (deliberate) hole which is all > >> 2M aligned, which has no impact on the adjacent 2M superpages. > >> > >> When ebmalloc gets reclaimed, it must shatter one or two superpages > >> mapping the .data/.bss section. > > Looking at this I do not have a better idea for fix off the top of my head. > > So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave > > comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region. > > Should I post a patch or whether you would like to do it? > > The patch at the root of this thread is basically that. Yep, I know. > As a stopgap solution to unblock staging, I think I should drop the > adjustment of MB(1) to KB(32) and submit the patch. Sounds like a plan. > This satisfies Jan's request to not make the current buffer smaller than > it currently is, and will give us more time to discuss alternative > solutions, at the cost of wasting 1MB of RAM. (Not exactly breaking the > bank these days, but definitely something which should be fixed before > 4.9 ships.) Granted! Daniel
>>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote: > On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote: >> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote: >> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: >> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: >> >> > On 28/02/17 16:03, Jan Beulich wrote: >> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >> >> >>> Freeing part of the BSS back for general use proves to be problematic. It >> >> > is >> >> >>> not accounted for in xen_in_range(), causing errors when constructing the >> >> >>> IOMMU tables, resulting in a failure to boot. >> >> >>> >> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor >> >> > data, >> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings. >> >> >>> >> >> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a >> >> >>> more-reasonable 32kB and keep the entire buffer around after boot. >> >> >> Well, that's just because right now there's only a single user. The >> >> >> reason I refused Daniel making it smaller than its predecessor is >> >> >> that we can't really give a good estimate of how much data may >> >> >> need storing there: The memory map can have hundreds of entries >> >> >> and command lines for modules may also be almost arbitrarily long. >> >> >> >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot >> >> >> services allocations here? >> >> > >> >> > From the original commit message, >> >> > >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool() >> >> > or AllocatePages()) to get memory chunk. However, later (somewhere >> >> > in __start_xen()) we must copy its contents to safe place or reserve >> >> > it in e820 memory map and map it in Xen virtual address space. >> >> >> >> Reading this again, I have to admit that I don't understand why any >> >> copying or reserving would need to be done. We'd need to do runtime >> >> allocations, sure, but I would have thought this goes without saying. >> > >> > We discussed this once but I do not remember the thread. In general Xen EFI >> > boot code should allocate memory as EfiLoaderData. However, later in >> > efi_arch_process_memory_map() we treat this memory type as free. This means >> > that it can be overwritten. So, that is why I mentioned copy or reservation. >> >> There's nothing disallowing runtime data allocations, afaik. > > Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO. > AIUI, this is not intended to use in that way. And that you base on what statement(s) in the spec? Jan
>>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote: > On 28/02/17 19:01, Daniel Kiper wrote: >> Looking at this I do not have a better idea for fix off the top of my head. Well, one better solution was already suggested: Put it in .init.data and suppress the freeing of the used part of it. The other option is to put a section along the lines of Linux'es .brk _after_ .bss (whether ahead of or after _end would need to be seen), replacing the uses of __2M_rwdata_end by a runtime calculated value (and freeing space past the used portion accordingly, without shattering large pages if used). >> So, I have a feeling that we should drop free_ebmalloc_unused_mem() and leave >> comment around ebmalloc() why we do not reclaim unused ebmalloc_mem region. >> Should I post a patch or whether you would like to do it? > > The patch at the root of this thread is basically that. > > As a stopgap solution to unblock staging, I think I should drop the > adjustment of MB(1) to KB(32) and submit the patch. > > This satisfies Jan's request to not make the current buffer smaller than > it currently is, and will give us more time to discuss alternative > solutions, at the cost of wasting 1MB of RAM. (Not exactly breaking the > bank these days, but definitely something which should be fixed before > 4.9 ships.) Yes, as an immediate band aid I can live with this. Jan
On Wed, Mar 01, 2017 at 12:48:26AM -0700, Jan Beulich wrote: > >>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote: > > On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote: > >> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote: > >> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: > >> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: > >> >> > On 28/02/17 16:03, Jan Beulich wrote: > >> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: > >> >> >>> Freeing part of the BSS back for general use proves to be problematic. It > >> >> > is > >> >> >>> not accounted for in xen_in_range(), causing errors when constructing the > >> >> >>> IOMMU tables, resulting in a failure to boot. > >> >> >>> > >> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor > >> >> > data, > >> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, > >> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings. > >> >> >>> > >> >> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to a > >> >> >>> more-reasonable 32kB and keep the entire buffer around after boot. > >> >> >> Well, that's just because right now there's only a single user. The > >> >> >> reason I refused Daniel making it smaller than its predecessor is > >> >> >> that we can't really give a good estimate of how much data may > >> >> >> need storing there: The memory map can have hundreds of entries > >> >> >> and command lines for modules may also be almost arbitrarily long. > >> >> >> > >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot > >> >> >> services allocations here? > >> >> > > >> >> > From the original commit message, > >> >> > > >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool() > >> >> > or AllocatePages()) to get memory chunk. However, later (somewhere > >> >> > in __start_xen()) we must copy its contents to safe place or reserve > >> >> > it in e820 memory map and map it in Xen virtual address space. > >> >> > >> >> Reading this again, I have to admit that I don't understand why any > >> >> copying or reserving would need to be done. We'd need to do runtime > >> >> allocations, sure, but I would have thought this goes without saying. > >> > > >> > We discussed this once but I do not remember the thread. In general Xen EFI > >> > boot code should allocate memory as EfiLoaderData. However, later in > >> > efi_arch_process_memory_map() we treat this memory type as free. This means > >> > that it can be overwritten. So, that is why I mentioned copy or reservation. > >> > >> There's nothing disallowing runtime data allocations, afaik. > > > > Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO. > > AIUI, this is not intended to use in that way. > > And that you base on what statement(s) in the spec? UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers should allocate memory (and pool) of type EfiRuntimeServicesData (although such allocation can only be made during boot services time). UEFI ver. 2.6, p. 210, Default pool allocations from memory type row. Xen is UEFI application, so, we should use EfiLoaderData. Of course we potentially can use EfiRuntimeServicesData because it is not strictly forbidden. However, as you can see above EfiRuntimeServicesData is targeted for UEFI runtime drivers and stuff like that. So, that is why I said "not nice" in my earlier email. Daniel
>>> On 01.03.17 at 09:20, <daniel.kiper@oracle.com> wrote: > On Wed, Mar 01, 2017 at 12:48:26AM -0700, Jan Beulich wrote: >> >>> On 28.02.17 at 19:45, <daniel.kiper@oracle.com> wrote: >> > On Tue, Feb 28, 2017 at 10:24:36AM -0700, Jan Beulich wrote: >> >> >>> On 28.02.17 at 18:09, <daniel.kiper@oracle.com> wrote: >> >> > On Tue, Feb 28, 2017 at 09:14:24AM -0700, Jan Beulich wrote: >> >> >> >>> On 28.02.17 at 17:08, <andrew.cooper3@citrix.com> wrote: >> >> >> > On 28/02/17 16:03, Jan Beulich wrote: >> >> >> >>>>> On 28.02.17 at 16:20, <andrew.cooper3@citrix.com> wrote: >> >> >> >>> Freeing part of the BSS back for general use proves to be problematic. > It >> >> >> > is >> >> >> >>> not accounted for in xen_in_range(), causing errors when constructing the >> >> >> >>> IOMMU tables, resulting in a failure to boot. >> >> >> >>> >> >> >> >>> Other smaller issues are that tboot treats the entire BSS as hypervisor >> >> >> > data, >> >> >> >>> creating and checking a MAC of it on S3, and that, by being 1MB in size, >> >> >> >>> freeing it guarentees to shatter the hypervisor superpage mappings. >> >> >> >>> >> >> >> >>> Judging by the content stored in it, 1MB is overkill on size. Drop it to > a >> >> >> >>> more-reasonable 32kB and keep the entire buffer around after boot. >> >> >> >> Well, that's just because right now there's only a single user. The >> >> >> >> reason I refused Daniel making it smaller than its predecessor is >> >> >> >> that we can't really give a good estimate of how much data may >> >> >> >> need storing there: The memory map can have hundreds of entries >> >> >> >> and command lines for modules may also be almost arbitrarily long. >> >> >> >> >> >> >> >> What I don't recall, Daniel: Why was it that we can't use EFI boot >> >> >> >> services allocations here? >> >> >> > >> >> >> > From the original commit message, >> >> >> > >> >> >> > 1) We could use native EFI allocation functions (e.g. AllocatePool() >> >> >> > or AllocatePages()) to get memory chunk. However, later (somewhere >> >> >> > in __start_xen()) we must copy its contents to safe place or > reserve >> >> >> > it in e820 memory map and map it in Xen virtual address space. >> >> >> >> >> >> Reading this again, I have to admit that I don't understand why any >> >> >> copying or reserving would need to be done. We'd need to do runtime >> >> >> allocations, sure, but I would have thought this goes without saying. >> >> > >> >> > We discussed this once but I do not remember the thread. In general Xen > EFI >> >> > boot code should allocate memory as EfiLoaderData. However, later in >> >> > efi_arch_process_memory_map() we treat this memory type as free. This > means >> >> > that it can be overwritten. So, that is why I mentioned copy or > reservation. >> >> >> >> There's nothing disallowing runtime data allocations, afaik. >> > >> > Do you mean EfiRuntimeServicesData? Probably possible but not nice IMO. >> > AIUI, this is not intended to use in that way. >> >> And that you base on what statement(s) in the spec? > > UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should > allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must > allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers > should allocate memory (and pool) of type EfiRuntimeServicesData (although such > allocation can only be made during boot services time). Note the "In general ... should ...", as opposed to the "must" later on. The situation here simply falls outside of the general case, and there's a reason the spec isn't strict here. ARM, btw., already has such a use in efi_arch_handle_cmdline(). Jan
On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote: > >>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote: > > On 28/02/17 19:01, Daniel Kiper wrote: > >> Looking at this I do not have a better idea for fix off the top of my head. > > Well, one better solution was already suggested: Put it in .init.data and This will increase Xen binary size. So, I prefer to leave it in BSS if possible. > suppress the freeing of the used part of it. The other option is to put > a section along the lines of Linux'es .brk _after_ .bss (whether ahead > of or after _end would need to be seen), replacing the uses of > __2M_rwdata_end by a runtime calculated value (and freeing space > past the used portion accordingly, without shattering large pages if > used). OK, make sense for me. However, from a bootloader point of view it should be still part of BSS. Otherwise, the bootloader can put the end of Xen image too close to reserved/no ram/etc. region. This does not seem important right now but after applying the rest of my patch series it can make a pain. Daniel
>>> On 01.03.17 at 10:00, <daniel.kiper@oracle.com> wrote: > On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote: >> >>> On 28.02.17 at 20:09, <andrew.cooper3@citrix.com> wrote: >> > On 28/02/17 19:01, Daniel Kiper wrote: >> >> Looking at this I do not have a better idea for fix off the top of my head. >> >> Well, one better solution was already suggested: Put it in .init.data and > > This will increase Xen binary size. So, I prefer to leave it in BSS if > possible. I don't think that's very important a criteria. >> suppress the freeing of the used part of it. The other option is to put >> a section along the lines of Linux'es .brk _after_ .bss (whether ahead >> of or after _end would need to be seen), replacing the uses of >> __2M_rwdata_end by a runtime calculated value (and freeing space >> past the used portion accordingly, without shattering large pages if >> used). > > OK, make sense for me. However, from a bootloader point of view it should be > still part of BSS. Otherwise, the bootloader can put the end of Xen image too > close to reserved/no ram/etc. region. This does not seem important right now > but after applying the rest of my patch series it can make a pain. I'm afraid I don't understand: It needs to be part of the ELF load segments or PE32+ loaded sections, but whether that's part of .bss or an adjacent @nobits section at the source level shouldn't matter at all. Jan
On Wed, Mar 01, 2017 at 01:47:21AM -0700, Jan Beulich wrote: > >>> On 01.03.17 at 09:20, <daniel.kiper@oracle.com> wrote: [...] > > UEFI ver. 2.6, p. 154: In general, UEFI OS loaders and UEFI applications should > > allocate memory (and pool) of type EfiLoaderData. UEFI boot service drivers must > > allocate memory (and pool) of type EfiBootServicesData. UREFI runtime drivers > > should allocate memory (and pool) of type EfiRuntimeServicesData (although such > > allocation can only be made during boot services time). > > Note the "In general ... should ...", as opposed to the "must" later on. > The situation here simply falls outside of the general case, and there's > a reason the spec isn't strict here. ARM, btw., already has such a use > in efi_arch_handle_cmdline(). Yes, I see that. And that is why I say "not nice" not fundamentally wrong. Daniel
On Wed, Mar 01, 2017 at 02:08:17AM -0700, Jan Beulich wrote: > >>> On 01.03.17 at 10:00, <daniel.kiper@oracle.com> wrote: > > On Wed, Mar 01, 2017 at 01:02:55AM -0700, Jan Beulich wrote: [...] > >> suppress the freeing of the used part of it. The other option is to put > >> a section along the lines of Linux'es .brk _after_ .bss (whether ahead > >> of or after _end would need to be seen), replacing the uses of > >> __2M_rwdata_end by a runtime calculated value (and freeing space > >> past the used portion accordingly, without shattering large pages if > >> used). > > > > OK, make sense for me. However, from a bootloader point of view it should be > > still part of BSS. Otherwise, the bootloader can put the end of Xen image too > > close to reserved/no ram/etc. region. This does not seem important right now > > but after applying the rest of my patch series it can make a pain. > > I'm afraid I don't understand: It needs to be part of the ELF > load segments or PE32+ loaded sections, but whether that's > part of .bss or an adjacent @nobits section at the source level > shouldn't matter at all. At source level I do not care much. And if it is a part "of the ELF load segments" and "PE32+ loaded sections" then it is OK for me. Daniel
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c index b6cbdad..82bf15b 100644 --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -120,11 +120,10 @@ static CHAR16 __initdata newline[] = L"\r\n"; * have to be disabled in xen/arch/arm/arm64/head.S; though BSS * should be initialized somehow before use of variables living there, * - use ebmalloc() in ARM/common EFI boot code, - * - call free_ebmalloc_unused_mem() somewhere in init code. */ #define EBMALLOC_SIZE MB(0) #else -#define EBMALLOC_SIZE MB(1) +#define EBMALLOC_SIZE KB(32) #endif static char __section(".bss.page_aligned") __aligned(PAGE_SIZE) @@ -144,19 +143,6 @@ static void __init __maybe_unused *ebmalloc(size_t size) return ptr; } -static void __init __maybe_unused free_ebmalloc_unused_mem(void) -{ - unsigned long start, end; - - start = (unsigned long)ebmalloc_mem + PAGE_ALIGN(ebmalloc_allocated); - end = (unsigned long)ebmalloc_mem + sizeof(ebmalloc_mem); - - destroy_xen_mappings(start, end); - init_xenheap_pages(__pa(start), __pa(end)); - - printk(XENLOG_INFO "Freed %lukB unused BSS memory\n", (end - start) >> 10); -} - /* * Include architecture specific implementation here, which references the * static globals defined above. @@ -1310,8 +1296,6 @@ void __init efi_init_memory(void) } *extra, *extra_head = NULL; #endif - free_ebmalloc_unused_mem(); - if ( !efi_enabled(EFI_BOOT) ) return;
Freeing part of the BSS back for general use proves to be problematic. It is not accounted for in xen_in_range(), causing errors when constructing the IOMMU tables, resulting in a failure to boot. Other smaller issues are that tboot treats the entire BSS as hypervisor data, creating and checking a MAC of it on S3, and that, by being 1MB in size, freeing it guarentees to shatter the hypervisor superpage mappings. Judging by the content stored in it, 1MB is overkill on size. Drop it to a more-reasonable 32kB and keep the entire buffer around after boot. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> --- xen/common/efi/boot.c | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-)