Message ID | 1456192703-2274-2-git-send-email-ddaney.cavm@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > There are two problems with the UEFI stub DT memory node removal > routine: > - it deletes nodes as it traverses the tree, which happens to work > but is not supported, as deletion invalidates the node iterator; > - deleting memory nodes entirely may discard annotations in the form > of additional properties on the nodes. > > Since the discovery of DT memory nodes occurs strictly before the > UEFI init sequence, we can simply clear the memblock memory table > before parsing the UEFI memory map. This way, it is no longer > necessary to remove the nodes, so we can remove that logic from the > stub as well. This is a little bit scary, but I guess this works. My only concern is that when we get kexec, a subsequent kernel must also have EFI memory map support, or things go bad for the next EFI-aware kernel after that (as things like the runtime services may have been corrupted by the kernel in the middle). It's difficult to fix the general case later. A different option would be to support status="disabled" for the memory nodes, and ignore these in early_init_dt_scan_memory. That way a kernel cannot use memory without first having parsed the EFI memory map, and we can still get NUMA info from the disabled nodes. You'd still need a new kernel to take into account status, but at least we'd know all kernels would avoid using RAM that potentially needs to be preserved. Ard, Rob, thoughts? Mark. > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Signed-off-by: David Daney <david.daney@cavium.com> > --- > drivers/firmware/efi/arm-init.c | 8 ++++++++ > drivers/firmware/efi/libstub/fdt.c | 24 +----------------------- > 2 files changed, 9 insertions(+), 23 deletions(-) > > diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c > index 9e15d57..40c9d85 100644 > --- a/drivers/firmware/efi/arm-init.c > +++ b/drivers/firmware/efi/arm-init.c > @@ -143,6 +143,14 @@ static __init void reserve_regions(void) > if (efi_enabled(EFI_DBG)) > pr_info("Processing EFI memory map:\n"); > > + /* > + * Discard memblocks discovered so far: if there are any at this > + * point, they originate from memory nodes in the DT, and UEFI > + * uses its own memory map instead. > + */ > + memblock_dump_all(); > + memblock_remove(0, ULLONG_MAX); > + > for_each_efi_memory_desc(&memmap, md) { > paddr = md->phys_addr; > npages = md->num_pages; > diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c > index cf7b7d4..9df1560 100644 > --- a/drivers/firmware/efi/libstub/fdt.c > +++ b/drivers/firmware/efi/libstub/fdt.c > @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt, > unsigned long map_size, unsigned long desc_size, > u32 desc_ver) > { > - int node, prev, num_rsv; > + int node, num_rsv; > int status; > u32 fdt_val32; > u64 fdt_val64; > @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt, > goto fdt_set_fail; > > /* > - * Delete any memory nodes present. We must delete nodes which > - * early_init_dt_scan_memory may try to use. > - */ > - prev = 0; > - for (;;) { > - const char *type; > - int len; > - > - node = fdt_next_node(fdt, prev, NULL); > - if (node < 0) > - break; > - > - type = fdt_getprop(fdt, node, "device_type", &len); > - if (type && strncmp(type, "memory", len) == 0) { > - fdt_del_node(fdt, node); > - continue; > - } > - > - prev = node; > - } > - > - /* > * Delete all memory reserve map entries. When booting via UEFI, > * kernel will use the UEFI memory map to find reserved regions. > */ > -- > 1.8.3.1 >
On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote: > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > > There are two problems with the UEFI stub DT memory node removal > > routine: > > - it deletes nodes as it traverses the tree, which happens to work > > but is not supported, as deletion invalidates the node iterator; > > - deleting memory nodes entirely may discard annotations in the form > > of additional properties on the nodes. > > > > Since the discovery of DT memory nodes occurs strictly before the > > UEFI init sequence, we can simply clear the memblock memory table > > before parsing the UEFI memory map. This way, it is no longer > > necessary to remove the nodes, so we can remove that logic from the > > stub as well. > > This is a little bit scary, but I guess this works. > > My only concern is that when we get kexec, a subsequent kernel must also > have EFI memory map support, or things go bad for the next EFI-aware > kernel after that (as things like the runtime services may have been > corrupted by the kernel in the middle). It's difficult to fix the > general case later. > > A different option would be to support status="disabled" for the memory > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel > cannot use memory without first having parsed the EFI memory map, and we > can still get NUMA info from the disabled nodes. So in that case, the middle, non-EFI kernel would fail to boot? Realistically, once you've kexec'd a non-EFI payload, I don't think you can rely on the EFI state remaining intact for future EFI applications. Is this really something we should be trying to police in the kernel? Will
On 23 February 2016 at 13:16, Will Deacon <will.deacon@arm.com> wrote: > On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote: >> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: >> > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> > >> > There are two problems with the UEFI stub DT memory node removal >> > routine: >> > - it deletes nodes as it traverses the tree, which happens to work >> > but is not supported, as deletion invalidates the node iterator; >> > - deleting memory nodes entirely may discard annotations in the form >> > of additional properties on the nodes. >> > >> > Since the discovery of DT memory nodes occurs strictly before the >> > UEFI init sequence, we can simply clear the memblock memory table >> > before parsing the UEFI memory map. This way, it is no longer >> > necessary to remove the nodes, so we can remove that logic from the >> > stub as well. >> >> This is a little bit scary, but I guess this works. >> >> My only concern is that when we get kexec, a subsequent kernel must also >> have EFI memory map support, or things go bad for the next EFI-aware >> kernel after that (as things like the runtime services may have been >> corrupted by the kernel in the middle). It's difficult to fix the >> general case later. >> >> A different option would be to support status="disabled" for the memory >> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel >> cannot use memory without first having parsed the EFI memory map, and we >> can still get NUMA info from the disabled nodes. > > So in that case, the middle, non-EFI kernel would fail to boot? > Realistically, once you've kexec'd a non-EFI payload, I don't think you > can rely on the EFI state remaining intact for future EFI applications. > > Is this really something we should be trying to police in the kernel? > Well, we could add entries to /reserved-memory in the stub for all the regions UEFI cares about, that would probably be sufficient to fix this case.
On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote: > Hi, > > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > > There are two problems with the UEFI stub DT memory node removal > > routine: > > - it deletes nodes as it traverses the tree, which happens to work > > but is not supported, as deletion invalidates the node iterator; > > - deleting memory nodes entirely may discard annotations in the form > > of additional properties on the nodes. > > > > Since the discovery of DT memory nodes occurs strictly before the > > UEFI init sequence, we can simply clear the memblock memory table > > before parsing the UEFI memory map. This way, it is no longer > > necessary to remove the nodes, so we can remove that logic from the > > stub as well. > > This is a little bit scary, but I guess this works. The way it is worded/implemented is, I agree. But if we simply say both can be present and the kernel will default to UEFI memory map, that seems sufficient to me. > My only concern is that when we get kexec, a subsequent kernel must also > have EFI memory map support, or things go bad for the next EFI-aware > kernel after that (as things like the runtime services may have been > corrupted by the kernel in the middle). It's difficult to fix the > general case later. > > A different option would be to support status="disabled" for the memory > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel > cannot use memory without first having parsed the EFI memory map, and we > can still get NUMA info from the disabled nodes. That would be a bit strange that the node is disabled, but still used. What if DT and UEFI tables are out of sync somehow? RAM is multiple mapped and different addresses were picked for example. > You'd still need a new kernel to take into account status, but at least > we'd know all kernels would avoid using RAM that potentially needs to be > preserved. > > Ard, Rob, thoughts?
On 2/23/2016 3:58 AM, Mark Rutland wrote: > Hi, > > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: >> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> >> There are two problems with the UEFI stub DT memory node removal >> routine: >> - it deletes nodes as it traverses the tree, which happens to work >> but is not supported, as deletion invalidates the node iterator; >> - deleting memory nodes entirely may discard annotations in the form >> of additional properties on the nodes. >> >> Since the discovery of DT memory nodes occurs strictly before the >> UEFI init sequence, we can simply clear the memblock memory table >> before parsing the UEFI memory map. This way, it is no longer >> necessary to remove the nodes, so we can remove that logic from the >> stub as well. > > This is a little bit scary, but I guess this works. > > My only concern is that when we get kexec, a subsequent kernel must also > have EFI memory map support, or things go bad for the next EFI-aware > kernel after that (as things like the runtime services may have been > corrupted by the kernel in the middle). It's difficult to fix the > general case later. > > A different option would be to support status="disabled" for the memory > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel > cannot use memory without first having parsed the EFI memory map, and we > can still get NUMA info from the disabled nodes. Please do not play games of treating nodes with status="disabled" as valid nodes. The mindset should be if it is disabled, it does not exist. There have been two bugs reported in the last week where code should have been ignoring disabled nodes and failed to. An audit of code scanning all nodes instead of all enabled nodes is now on my todo list. < snip > -Frank
On Wed, Feb 24, 2016 at 1:03 PM, Frank Rowand <frowand.list@gmail.com> wrote: > On 2/23/2016 3:58 AM, Mark Rutland wrote: >> Hi, >> >> On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: >>> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>> >>> There are two problems with the UEFI stub DT memory node removal >>> routine: >>> - it deletes nodes as it traverses the tree, which happens to work >>> but is not supported, as deletion invalidates the node iterator; >>> - deleting memory nodes entirely may discard annotations in the form >>> of additional properties on the nodes. >>> >>> Since the discovery of DT memory nodes occurs strictly before the >>> UEFI init sequence, we can simply clear the memblock memory table >>> before parsing the UEFI memory map. This way, it is no longer >>> necessary to remove the nodes, so we can remove that logic from the >>> stub as well. >> >> This is a little bit scary, but I guess this works. >> >> My only concern is that when we get kexec, a subsequent kernel must also >> have EFI memory map support, or things go bad for the next EFI-aware >> kernel after that (as things like the runtime services may have been >> corrupted by the kernel in the middle). It's difficult to fix the >> general case later. >> >> A different option would be to support status="disabled" for the memory >> nodes, and ignore these in early_init_dt_scan_memory. That way a kernel >> cannot use memory without first having parsed the EFI memory map, and we >> can still get NUMA info from the disabled nodes. > > Please do not play games of treating nodes with status="disabled" as > valid nodes. The mindset should be if it is disabled, it does not exist. > > There have been two bugs reported in the last week where code should > have been ignoring disabled nodes and failed to. An audit of code > scanning all nodes instead of all enabled nodes is now on my todo list. Perhaps we should merge the default/available variants of iterators into one. I suspect there are some valid uses. Otherwise, we could also just not even populate those nodes in the live tree. There are some cases where the kernel changes the status. Rob
On Wed, Feb 24, 2016 at 11:03:08AM -0800, Frank Rowand wrote: > On 2/23/2016 3:58 AM, Mark Rutland wrote: > > Hi, > > > > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: > >> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> > >> There are two problems with the UEFI stub DT memory node removal > >> routine: > >> - it deletes nodes as it traverses the tree, which happens to work > >> but is not supported, as deletion invalidates the node iterator; > >> - deleting memory nodes entirely may discard annotations in the form > >> of additional properties on the nodes. > >> > >> Since the discovery of DT memory nodes occurs strictly before the > >> UEFI init sequence, we can simply clear the memblock memory table > >> before parsing the UEFI memory map. This way, it is no longer > >> necessary to remove the nodes, so we can remove that logic from the > >> stub as well. > > > > This is a little bit scary, but I guess this works. > > > > My only concern is that when we get kexec, a subsequent kernel must also > > have EFI memory map support, or things go bad for the next EFI-aware > > kernel after that (as things like the runtime services may have been > > corrupted by the kernel in the middle). It's difficult to fix the > > general case later. > > > > A different option would be to support status="disabled" for the memory > > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel > > cannot use memory without first having parsed the EFI memory map, and we > > can still get NUMA info from the disabled nodes. > > Please do not play games of treating nodes with status="disabled" as > valid nodes. The mindset should be if it is disabled, it does not exist. I completely agree with this generally. The only possible wiggle room is ePAPR's decription of the precise meaning of the status property being binding-specific (and there may be some way to later "enable" the node or otehrwise make use of it). As with above, we'd only be extracting some information in the presence of a UEFI memory map. I agree that this is not a great pattern, and we don't necessarily want that even for "safe" cases like NUMA. > There have been two bugs reported in the last week where code should > have been ignoring disabled nodes and failed to. An audit of code > scanning all nodes instead of all enabled nodes is now on my todo list. That would be great! Mark.
On Tue, Feb 23, 2016 at 04:12:02PM -0600, Rob Herring wrote: > On Tue, Feb 23, 2016 at 11:58:05AM +0000, Mark Rutland wrote: > > Hi, > > > > On Mon, Feb 22, 2016 at 05:58:19PM -0800, David Daney wrote: > > > From: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > > > > There are two problems with the UEFI stub DT memory node removal > > > routine: > > > - it deletes nodes as it traverses the tree, which happens to work > > > but is not supported, as deletion invalidates the node iterator; > > > - deleting memory nodes entirely may discard annotations in the form > > > of additional properties on the nodes. > > > > > > Since the discovery of DT memory nodes occurs strictly before the > > > UEFI init sequence, we can simply clear the memblock memory table > > > before parsing the UEFI memory map. This way, it is no longer > > > necessary to remove the nodes, so we can remove that logic from the > > > stub as well. > > > > This is a little bit scary, but I guess this works. > > The way it is worded/implemented is, I agree. But if we simply say both > can be present and the kernel will default to UEFI memory map, that > seems sufficient to me. > > > My only concern is that when we get kexec, a subsequent kernel must also > > have EFI memory map support, or things go bad for the next EFI-aware > > kernel after that (as things like the runtime services may have been > > corrupted by the kernel in the middle). It's difficult to fix the > > general case later. > > > > A different option would be to support status="disabled" for the memory > > nodes, and ignore these in early_init_dt_scan_memory. That way a kernel > > cannot use memory without first having parsed the EFI memory map, and we > > can still get NUMA info from the disabled nodes. > > That would be a bit strange that the node is disabled, but still used. I agree this would be strange, and not necessarily a precedent we'd want to see copied elsewhere. Per ePAPR, a "disabled" node can be enabled in a binding-specific manner, so having the presence of a UEFI memory map "enable" the NUMA information would appear to be permitted. > What if DT and UEFI tables are out of sync somehow? RAM is multiple > mapped and different addresses were picked for example. That applies regardless of the status of the memory nodes. My suggestion was only that we acquired the NUMA node information, and added this node information (and not any additional extent of memory) to the UEFI memory map. This is precisely what we do with Ard's code, with the exception that in the absence of a UEFI memory map the kernel would know it was not permitted to access memory. Mark.
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index 9e15d57..40c9d85 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -143,6 +143,14 @@ static __init void reserve_regions(void) if (efi_enabled(EFI_DBG)) pr_info("Processing EFI memory map:\n"); + /* + * Discard memblocks discovered so far: if there are any at this + * point, they originate from memory nodes in the DT, and UEFI + * uses its own memory map instead. + */ + memblock_dump_all(); + memblock_remove(0, ULLONG_MAX); + for_each_efi_memory_desc(&memmap, md) { paddr = md->phys_addr; npages = md->num_pages; diff --git a/drivers/firmware/efi/libstub/fdt.c b/drivers/firmware/efi/libstub/fdt.c index cf7b7d4..9df1560 100644 --- a/drivers/firmware/efi/libstub/fdt.c +++ b/drivers/firmware/efi/libstub/fdt.c @@ -24,7 +24,7 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt, unsigned long map_size, unsigned long desc_size, u32 desc_ver) { - int node, prev, num_rsv; + int node, num_rsv; int status; u32 fdt_val32; u64 fdt_val64; @@ -54,28 +54,6 @@ efi_status_t update_fdt(efi_system_table_t *sys_table, void *orig_fdt, goto fdt_set_fail; /* - * Delete any memory nodes present. We must delete nodes which - * early_init_dt_scan_memory may try to use. - */ - prev = 0; - for (;;) { - const char *type; - int len; - - node = fdt_next_node(fdt, prev, NULL); - if (node < 0) - break; - - type = fdt_getprop(fdt, node, "device_type", &len); - if (type && strncmp(type, "memory", len) == 0) { - fdt_del_node(fdt, node); - continue; - } - - prev = node; - } - - /* * Delete all memory reserve map entries. When booting via UEFI, * kernel will use the UEFI memory map to find reserved regions. */