Message ID | 565BAA2E.5090102@cog.systems (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote: > In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir > and swapper_pg_dir where moved from before the kernel to after it. > > The problem is that these symbols fall outside the range covered by > the ELF file - outside of any section. > > A bootloader which loads the kernel ELF file and dynamically > determines where to place the DTB, may try place it after the > kernel. We've just run into this problem and the DTB gets > overwritten as soon as the pagetables are created. We had similar issues with the BSS when booting Image files prior to this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header"). Since then, the image_size field in the Image header tells you how much memory the kernel may clobber (including the BSS and page tables). Prior to that, the page tables were below the kernel, and also not described in any ELF section. Others booting the kernel vmlinux haven't reported similar issues, so I assume that either they are parsing the Image header, or getting lucky. Parsing the header is necessary to get the correct text offset, too... Pratyush, Geoff, I understood you were loading the kernel vmlinux for kexec. Do you parse the Image header to figure out where to place things? > I'd suggest that the kernel either: > A. document this boot requirement for where not to load a DTB Do you have any particular suggestion? We already describe the Image footprint (including BSS and page tables) by the image_size in the Image header, which is sufficient. The size of the BSS and page tables is effectively unbound, so we can't define some upper bound that will always be true. The documentation is written on the assumption that an Image file is being used rather than a vmlinux. Perhaps that is something to consider. > B. update the vmlinux.lds.S such that these symbols (and _end) are > properly covered by a section in the ELF, and thus preventing this > issue. I'm worried that this only solves this one case, and it means that there are two (potentially conflicting) sources of information that a bootloader might be using -- the ELF or the Image header. I don't want to have to duplicate text_offset and so on, which implies that parsing the Image header is necessary anyway. That's something we can discuss if you send a patch (inline rather than attached). Thanks, Mark.
On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> wrote: > Hi, > > On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote: >> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir >> and swapper_pg_dir where moved from before the kernel to after it. >> >> The problem is that these symbols fall outside the range covered by >> the ELF file - outside of any section. >> >> A bootloader which loads the kernel ELF file and dynamically >> determines where to place the DTB, may try place it after the >> kernel. We've just run into this problem and the DTB gets >> overwritten as soon as the pagetables are created. Could you explain why you are using the ELF file and not the binary image file? This is not future proof: currently, the Image is a straight binary objcopy of vmlinux, but that is not guaranteed to remain that way. Things like KASLR may require post build steps that mangle vmlinux or Image afterwards. > > We had similar issues with the BSS when booting Image files prior to > this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header"). > Since then, the image_size field in the Image header tells you how much > memory the kernel may clobber (including the BSS and page tables). > > Prior to that, the page tables were below the kernel, and also not > described in any ELF section. > > Others booting the kernel vmlinux haven't reported similar issues, so I > assume that either they are parsing the Image header, or getting lucky. > Parsing the header is necessary to get the correct text offset, too... > > Pratyush, Geoff, I understood you were loading the kernel vmlinux for > kexec. Do you parse the Image header to figure out where to place > things? > >> I'd suggest that the kernel either: >> A. document this boot requirement for where not to load a DTB > > Do you have any particular suggestion? > > We already describe the Image footprint (including BSS and page tables) > by the image_size in the Image header, which is sufficient. The size of > the BSS and page tables is effectively unbound, so we can't define some > upper bound that will always be true. > > The documentation is written on the assumption that an Image file is > being used rather than a vmlinux. Perhaps that is something to consider. > >> B. update the vmlinux.lds.S such that these symbols (and _end) are >> properly covered by a section in the ELF, and thus preventing this >> issue. > > I'm worried that this only solves this one case, and it means that there > are two (potentially conflicting) sources of information that a > bootloader might be using -- the ELF or the Image header. I don't want > to have to duplicate text_offset and so on, which implies that parsing > the Image header is necessary anyway. > > That's something we can discuss if you send a patch (inline rather than > attached). > I think updating the linker script to put the page tables into a .pgdir section is reasonable, since it is part of the static footprint of the kernel. However, I strongly feel that the Image header should remain the authoritative source of information regarding the nature (big/little endian, page size) and the static footprint of the Image .
On 01/12/2015:11:02:55 AM, Mark Rutland wrote: > Pratyush, Geoff, I understood you were loading the kernel vmlinux for > kexec. Do you parse the Image header to figure out where to place > things? Yes, ARM64 kexec-tools supports both elf and binary image loading and in both the cases text_offset and image_size is calculated from image header. Location for other segments like initrd or DTB are calculatd accordingly [1] ~Pratyush [1] http://git.kernel.org/cgit/linux/kernel/git/geoff/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n585
On 12/01/2015 06:52 AM, Ard Biesheuvel wrote: > On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> wrote: >> Hi, >> >> On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote: >>> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir >>> and swapper_pg_dir where moved from before the kernel to after it. >>> >>> The problem is that these symbols fall outside the range covered by >>> the ELF file - outside of any section. >>> >>> A bootloader which loads the kernel ELF file and dynamically >>> determines where to place the DTB, may try place it after the >>> kernel. We've just run into this problem and the DTB gets >>> overwritten as soon as the pagetables are created. > > Could you explain why you are using the ELF file and not the binary image file? > This is not future proof: currently, the Image is a straight binary > objcopy of vmlinux, but that is not guaranteed to remain that way. > Things like KASLR may require post build steps that mangle vmlinux or > Image afterwards. > >> >> We had similar issues with the BSS when booting Image files prior to >> this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header"). >> Since then, the image_size field in the Image header tells you how much >> memory the kernel may clobber (including the BSS and page tables). >> >> Prior to that, the page tables were below the kernel, and also not >> described in any ELF section. >> >> Others booting the kernel vmlinux haven't reported similar issues, so I >> assume that either they are parsing the Image header, or getting lucky. >> Parsing the header is necessary to get the correct text offset, too... >> >> Pratyush, Geoff, I understood you were loading the kernel vmlinux for >> kexec. Do you parse the Image header to figure out where to place >> things? >> >>> I'd suggest that the kernel either: >>> A. document this boot requirement for where not to load a DTB >> >> Do you have any particular suggestion? >> >> We already describe the Image footprint (including BSS and page tables) >> by the image_size in the Image header, which is sufficient. The size of >> the BSS and page tables is effectively unbound, so we can't define some >> upper bound that will always be true. >> >> The documentation is written on the assumption that an Image file is >> being used rather than a vmlinux. Perhaps that is something to consider. >> >>> B. update the vmlinux.lds.S such that these symbols (and _end) are >>> properly covered by a section in the ELF, and thus preventing this >>> issue. >> >> I'm worried that this only solves this one case, and it means that there >> are two (potentially conflicting) sources of information that a >> bootloader might be using -- the ELF or the Image header. I don't want >> to have to duplicate text_offset and so on, which implies that parsing >> the Image header is necessary anyway. >> >> That's something we can discuss if you send a patch (inline rather than >> attached). >> > > I think updating the linker script to put the page tables into a > .pgdir section is reasonable, since it is part of the static footprint > of the kernel. > > However, I strongly feel that the Image header should remain the > authoritative source of information regarding the nature (big/little > endian, page size) and the static footprint of the Image. I find `readelf -a | less` quite handy. Is there anything comparable for the AArch64 Image format? Please forgive my ignorance, but is the EFI stub another possible source for sort of information? Thanks, Christopher Covington
On Tue, Dec 01, 2015 at 05:09:18PM -0500, Christopher Covington wrote: > On 12/01/2015 06:52 AM, Ard Biesheuvel wrote: > > However, I strongly feel that the Image header should remain the > > authoritative source of information regarding the nature (big/little > > endian, page size) and the static footprint of the Image. > > I find `readelf -a | less` quite handy. Is there anything comparable for > the AArch64 Image format? Not that I am aware of. These days I just use od or hexedit, and parse the header manually. It's documented, so it would be possible to write one. > Please forgive my ignorance, but is the EFI stub another possible source > for sort of information? Not really. The PE/COFF header for the EFI stub realistically only tells you wether or not the kernel has an EFI stub. It shouldn't be used to derive information about the kernel itself. Thanks, Mark.
Hi, On 1 December 2015 Ard Biesheuvel <ard.biesheuvel@linaro.org> wote: > On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> > wrote: > > Hi, > > > > On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote: > >> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the > idmap_pg_dir > >> and swapper_pg_dir where moved from before the kernel to after it. > >> > >> The problem is that these symbols fall outside the range covered by > >> the ELF file - outside of any section. > >> > >> A bootloader which loads the kernel ELF file and dynamically > >> determines where to place the DTB, may try place it after the > >> kernel. We've just run into this problem and the DTB gets > >> overwritten as soon as the pagetables are created. > > Could you explain why you are using the ELF file and not the binary image > file? > This is not future proof: currently, the Image is a straight binary > objcopy of vmlinux, but that is not guaranteed to remain that way. > Things like KASLR may require post build steps that mangle vmlinux or > Image afterwards. The reason we've been using ELF files is mostly to do with legacy virtualization related reasons in our systems, we used to patch symbols in the ELFs for example pre device-tree. However, since it hadn't caused problems until now we had continued to use it. We haven't yet added Aarch64 Linux boot image header parsing but it should be trivial. The other area we are looking into is optimized multi-VM static boot images by constructing hypervisor-bundle images containing de-duplicated Linux sections, allowing an ELF bootloader to populate multiple Linux VMs from a smaller boot image - resulting in faster boot. > > > > We had similar issues with the BSS when booting Image files prior to > > this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header"). > > Since then, the image_size field in the Image header tells you how much > > memory the kernel may clobber (including the BSS and page tables). > > > > Prior to that, the page tables were below the kernel, and also not > > described in any ELF section. > > > > Others booting the kernel vmlinux haven't reported similar issues, so I > > assume that either they are parsing the Image header, or getting lucky. > > Parsing the header is necessary to get the correct text offset, too... > > > > Pratyush, Geoff, I understood you were loading the kernel vmlinux for > > kexec. Do you parse the Image header to figure out where to place > > things? > > > >> I'd suggest that the kernel either: > >> A. document this boot requirement for where not to load a DTB > > > > Do you have any particular suggestion? > > > > We already describe the Image footprint (including BSS and page tables) > > by the image_size in the Image header, which is sufficient. The size of > > the BSS and page tables is effectively unbound, so we can't define some > > upper bound that will always be true. > > > > The documentation is written on the assumption that an Image file is > > being used rather than a vmlinux. Perhaps that is something to consider. > > > >> B. update the vmlinux.lds.S such that these symbols (and _end) are > >> properly covered by a section in the ELF, and thus preventing this > >> issue. > > > > I'm worried that this only solves this one case, and it means that there > > are two (potentially conflicting) sources of information that a > > bootloader might be using -- the ELF or the Image header. I don't want > > to have to duplicate text_offset and so on, which implies that parsing > > the Image header is necessary anyway. > > > > That's something we can discuss if you send a patch (inline rather than > > attached). > > > > I think updating the linker script to put the page tables into a > .pgdir section is reasonable, since it is part of the static footprint > of the kernel. I agree > However, I strongly feel that the Image header should remain the > authoritative source of information regarding the nature (big/little > endian, page size) and the static footprint of the Image . Agreed, and there are other ways to de-duplicate which will still work with binary image inputs.
Hi, On Tue, 2015-12-01 at 11:02 +0000, Mark Rutland wrote: > Pratyush, Geoff, I understood you were loading the kernel vmlinux for > kexec. Do you parse the Image header to figure out where to place > things? Yes, in the kexec user tools we use text_offset to make enough room for the kernel, but there is also the need for page_offset. We need to know the page_offset to be able to do virtual to physical address conversions. We can calculate the page_offset for a vmlinux image as page_offset = phdr->p_vaddr - text_offset. The binary Image currently has no info about page_offset or virtual addressing. We have a kexec-tools option for the user to specify a page_offset. If that option is not provided we try to look at the running kernel's symbols, and if that fails, fall back to a default page_offset. This is less than ideal, and certainly makes the binary Image less appealing to use with kexec. -Geoff
On Thu, Dec 03, 2015 at 12:46:35AM +1100, Carl van Schaik wrote: > Hi, > > On 1 December 2015 Ard Biesheuvel <ard.biesheuvel@linaro.org> wote: > > On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> > > wrote: > > > Hi, > > > > > > On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote: > > >> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the > > idmap_pg_dir > > >> and swapper_pg_dir where moved from before the kernel to after it. > > >> > > >> The problem is that these symbols fall outside the range covered by > > >> the ELF file - outside of any section. > > >> > > >> A bootloader which loads the kernel ELF file and dynamically > > >> determines where to place the DTB, may try place it after the > > >> kernel. We've just run into this problem and the DTB gets > > >> overwritten as soon as the pagetables are created. > > > > Could you explain why you are using the ELF file and not the binary image > > file? > > This is not future proof: currently, the Image is a straight binary > > objcopy of vmlinux, but that is not guaranteed to remain that way. > > Things like KASLR may require post build steps that mangle vmlinux or > > Image afterwards. > > The reason we've been using ELF files is mostly to do with legacy virtualization > related reasons in our systems, we used to patch symbols in the ELFs for example > pre device-tree. However, since it hadn't caused problems until now we had > continued to use it. We haven't yet added Aarch64 Linux boot image header parsing > but it should be trivial. > > The other area we are looking into is optimized multi-VM static boot images by > constructing hypervisor-bundle images containing de-duplicated Linux sections, > allowing an ELF bootloader to populate multiple Linux VMs from a smaller boot > image - resulting in faster boot. Ok. Per Ard's comments, this may get broken in future by KASLR or similar; we cannot make strong guarantees as to the vmlinux being directly usable. That's a different discussion, though... > > >> I'd suggest that the kernel either: > > >> A. document this boot requirement for where not to load a DTB > > > > > > Do you have any particular suggestion? > > > > > > We already describe the Image footprint (including BSS and page tables) > > > by the image_size in the Image header, which is sufficient. The size of > > > the BSS and page tables is effectively unbound, so we can't define some > > > upper bound that will always be true. > > > > > > The documentation is written on the assumption that an Image file is > > > being used rather than a vmlinux. Perhaps that is something to consider. > > > > > >> B. update the vmlinux.lds.S such that these symbols (and _end) are > > >> properly covered by a section in the ELF, and thus preventing this > > >> issue. > > > > > > I'm worried that this only solves this one case, and it means that there > > > are two (potentially conflicting) sources of information that a > > > bootloader might be using -- the ELF or the Image header. I don't want > > > to have to duplicate text_offset and so on, which implies that parsing > > > the Image header is necessary anyway. > > > > > > That's something we can discuss if you send a patch (inline rather than > > > attached). > > > > > > > I think updating the linker script to put the page tables into a > > .pgdir section is reasonable, since it is part of the static footprint > > of the kernel. > > I agree Ok. As above, please send a standalone, inline patch for this. Please Cc at least myself, Ard, and Catalin. We can have any further discussion there. > > However, I strongly feel that the Image header should remain the > > authoritative source of information regarding the nature (big/little > > endian, page size) and the static footprint of the Image . > > Agreed, and there are other ways to de-duplicate which will still work > with binary image inputs. I completely agree that the Image is the canonical source of information. Thanks, Mark.
On Wed, Dec 02, 2015 at 11:03:48AM -0800, Geoff Levand wrote: > Hi, > > On Tue, 2015-12-01 at 11:02 +0000, Mark Rutland wrote: > > Pratyush, Geoff, I understood you were loading the kernel vmlinux for > > kexec. Do you parse the Image header to figure out where to place > > things? > > Yes, in the kexec user tools we use text_offset to make > enough room for the kernel, but there is also the need for > page_offset. > > We need to know the page_offset to be able to do virtual to > physical address conversions. We can calculate the page_offset > for a vmlinux image as page_offset = phdr->p_vaddr - text_offset. I don't understadn why you need to do that. Is that just just so you can figure out where to load the segments physically? > The binary Image currently has no info about page_offset or > virtual addressing. We have a kexec-tools option for the > user to specify a page_offset. If that option is not provided > we try to look at the running kernel's symbols, and if that > fails, fall back to a default page_offset. This is less than > ideal, and certainly makes the binary Image less appealing to > use with kexec. I don't follow at all why this complication is necessary. I don't think that the kexec tools should be looking at the kernel symbols in that manner, and I don't think that we need to expose the page offset via the Image header. The first loaded address in the vmlinux corresponds to PHYS_OFFSET + TEXT_OFFSET. If you know that, you can figure out an offset to apply to VAs to convert them to PAs when loading. What am I missing? Thanks, Mark.
From 25f5edabb5a719e24cc1865eb50b7894d0f976cf Mon Sep 17 00:00:00 2001 From: Carl van Schaik <carl@cog.systems> Date: Mon, 30 Nov 2015 12:39:59 +1100 Subject: [PATCH] arm64: place initial page tables in ELF section The swapper_pg_dir and idmap_pg_dir are placed above the kernel image, after the BSS. Currently, these are outside of any section in the ELF, which means a bootloader cannot from the ELF file alone determine where to place data (DTB etc). It needs to reserve an arbitarty space. This patch places the idmap_pd_dir and swapper_pg_dir symbols into a new .pg_dir section. With this defined, the bootloader can safely place images using addresses after the ELF. Signed-off-by: Carl van Schaik <carl@cog.systems> --- arch/arm64/kernel/vmlinux.lds.S | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index a2c2986..759ae68 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -154,10 +154,12 @@ SECTIONS BSS_SECTION(0, 0, 0) . = ALIGN(PAGE_SIZE); - idmap_pg_dir = .; - . += IDMAP_DIR_SIZE; - swapper_pg_dir = .; - . += SWAPPER_DIR_SIZE; + .pg_dir : { + idmap_pg_dir = .; + . += IDMAP_DIR_SIZE; + swapper_pg_dir = .; + . += SWAPPER_DIR_SIZE; + } _end = .; -- 2.3.2