diff mbox

arm64 boot requirements

Message ID 565BAA2E.5090102@cog.systems (mailing list archive)
State New, archived
Headers show

Commit Message

Carl van Schaik Nov. 30, 2015, 1:45 a.m. UTC
In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir and 
swapper_pg_dir where moved from before the kernel to after it.

The problem is that these symbols fall outside the range covered by the 
ELF file - outside of any section.

A bootloader which loads the kernel ELF file and dynamically determines 
where to place the DTB, may try place it after the kernel. We've just 
run into this problem and the DTB gets overwritten as soon as the 
pagetables are created.

I'd suggest that the kernel either:
  A. document this boot requirement for where not to load a DTB
  B. update the vmlinux.lds.S such that these symbols (and _end) are 
properly covered by a section in the ELF, and thus preventing this issue.

thanks,
Carl

Comments

Mark Rutland Dec. 1, 2015, 11:02 a.m. UTC | #1
Hi,

On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote:
> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir
> and swapper_pg_dir where moved from before the kernel to after it.
> 
> The problem is that these symbols fall outside the range covered by
> the ELF file - outside of any section.
> 
> A bootloader which loads the kernel ELF file and dynamically
> determines where to place the DTB, may try place it after the
> kernel. We've just run into this problem and the DTB gets
> overwritten as soon as the pagetables are created.

We had similar issues with the BSS when booting Image files prior to
this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header").
Since then, the image_size field in the Image header tells you how much
memory the kernel may clobber (including the BSS and page tables).

Prior to that, the page tables were below the kernel, and also not
described in any ELF section.

Others booting the kernel vmlinux haven't reported similar issues, so I
assume that either they are parsing the Image header, or getting lucky.
Parsing the header is necessary to get the correct text offset, too...

Pratyush, Geoff, I understood you were loading the kernel vmlinux for
kexec. Do you parse the Image header to figure out where to place
things?

> I'd suggest that the kernel either:
>  A. document this boot requirement for where not to load a DTB

Do you have any particular suggestion?

We already describe the Image footprint (including BSS and page tables)
by the image_size in the Image header, which is sufficient. The size of
the BSS and page tables is effectively unbound, so we can't define some
upper bound that will always be true.

The documentation is written on the assumption that an Image file is
being used rather than a vmlinux. Perhaps that is something to consider.

>  B. update the vmlinux.lds.S such that these symbols (and _end) are
> properly covered by a section in the ELF, and thus preventing this
> issue.

I'm worried that this only solves this one case, and it means that there
are two (potentially conflicting) sources of information that a
bootloader might be using -- the ELF or the Image header. I don't want
to have to duplicate text_offset and so on, which implies that parsing
the Image header is necessary anyway.

That's something we can discuss if you send a patch (inline rather than
attached).

Thanks,
Mark.
Ard Biesheuvel Dec. 1, 2015, 11:52 a.m. UTC | #2
On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi,
>
> On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote:
>> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir
>> and swapper_pg_dir where moved from before the kernel to after it.
>>
>> The problem is that these symbols fall outside the range covered by
>> the ELF file - outside of any section.
>>
>> A bootloader which loads the kernel ELF file and dynamically
>> determines where to place the DTB, may try place it after the
>> kernel. We've just run into this problem and the DTB gets
>> overwritten as soon as the pagetables are created.

Could you explain why you are using the ELF file and not the binary image file?
This is not future proof: currently, the Image is a straight binary
objcopy of vmlinux, but that is not guaranteed to remain that way.
Things like KASLR may require post build steps that mangle vmlinux or
Image afterwards.

>
> We had similar issues with the BSS when booting Image files prior to
> this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header").
> Since then, the image_size field in the Image header tells you how much
> memory the kernel may clobber (including the BSS and page tables).
>
> Prior to that, the page tables were below the kernel, and also not
> described in any ELF section.
>
> Others booting the kernel vmlinux haven't reported similar issues, so I
> assume that either they are parsing the Image header, or getting lucky.
> Parsing the header is necessary to get the correct text offset, too...
>
> Pratyush, Geoff, I understood you were loading the kernel vmlinux for
> kexec. Do you parse the Image header to figure out where to place
> things?
>
>> I'd suggest that the kernel either:
>>  A. document this boot requirement for where not to load a DTB
>
> Do you have any particular suggestion?
>
> We already describe the Image footprint (including BSS and page tables)
> by the image_size in the Image header, which is sufficient. The size of
> the BSS and page tables is effectively unbound, so we can't define some
> upper bound that will always be true.
>
> The documentation is written on the assumption that an Image file is
> being used rather than a vmlinux. Perhaps that is something to consider.
>
>>  B. update the vmlinux.lds.S such that these symbols (and _end) are
>> properly covered by a section in the ELF, and thus preventing this
>> issue.
>
> I'm worried that this only solves this one case, and it means that there
> are two (potentially conflicting) sources of information that a
> bootloader might be using -- the ELF or the Image header. I don't want
> to have to duplicate text_offset and so on, which implies that parsing
> the Image header is necessary anyway.
>
> That's something we can discuss if you send a patch (inline rather than
> attached).
>

I think updating the linker script to put the page tables into a
.pgdir section is reasonable, since it is part of the static footprint
of the kernel.

However, I strongly feel that the Image header should remain the
authoritative source of information regarding the nature (big/little
endian, page size) and the static footprint of the Image .
Pratyush Anand Dec. 1, 2015, 12:50 p.m. UTC | #3
On 01/12/2015:11:02:55 AM, Mark Rutland wrote:
> Pratyush, Geoff, I understood you were loading the kernel vmlinux for
> kexec. Do you parse the Image header to figure out where to place
> things?

Yes, ARM64 kexec-tools supports both elf and binary image loading and in both
the cases text_offset and image_size is calculated from image header. Location
for other segments like initrd or DTB are calculatd accordingly [1]

~Pratyush

[1] http://git.kernel.org/cgit/linux/kernel/git/geoff/kexec-tools.git/tree/kexec/arch/arm64/kexec-arm64.c#n585
Christopher Covington Dec. 1, 2015, 10:09 p.m. UTC | #4
On 12/01/2015 06:52 AM, Ard Biesheuvel wrote:
> On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com> wrote:
>> Hi,
>>
>> On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote:
>>> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the idmap_pg_dir
>>> and swapper_pg_dir where moved from before the kernel to after it.
>>>
>>> The problem is that these symbols fall outside the range covered by
>>> the ELF file - outside of any section.
>>>
>>> A bootloader which loads the kernel ELF file and dynamically
>>> determines where to place the DTB, may try place it after the
>>> kernel. We've just run into this problem and the DTB gets
>>> overwritten as soon as the pagetables are created.
> 
> Could you explain why you are using the ELF file and not the binary image file?
> This is not future proof: currently, the Image is a straight binary
> objcopy of vmlinux, but that is not guaranteed to remain that way.
> Things like KASLR may require post build steps that mangle vmlinux or
> Image afterwards.
> 
>>
>> We had similar issues with the BSS when booting Image files prior to
>> this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header").
>> Since then, the image_size field in the Image header tells you how much
>> memory the kernel may clobber (including the BSS and page tables).
>>
>> Prior to that, the page tables were below the kernel, and also not
>> described in any ELF section.
>>
>> Others booting the kernel vmlinux haven't reported similar issues, so I
>> assume that either they are parsing the Image header, or getting lucky.
>> Parsing the header is necessary to get the correct text offset, too...
>>
>> Pratyush, Geoff, I understood you were loading the kernel vmlinux for
>> kexec. Do you parse the Image header to figure out where to place
>> things?
>>
>>> I'd suggest that the kernel either:
>>>  A. document this boot requirement for where not to load a DTB
>>
>> Do you have any particular suggestion?
>>
>> We already describe the Image footprint (including BSS and page tables)
>> by the image_size in the Image header, which is sufficient. The size of
>> the BSS and page tables is effectively unbound, so we can't define some
>> upper bound that will always be true.
>>
>> The documentation is written on the assumption that an Image file is
>> being used rather than a vmlinux. Perhaps that is something to consider.
>>
>>>  B. update the vmlinux.lds.S such that these symbols (and _end) are
>>> properly covered by a section in the ELF, and thus preventing this
>>> issue.
>>
>> I'm worried that this only solves this one case, and it means that there
>> are two (potentially conflicting) sources of information that a
>> bootloader might be using -- the ELF or the Image header. I don't want
>> to have to duplicate text_offset and so on, which implies that parsing
>> the Image header is necessary anyway.
>>
>> That's something we can discuss if you send a patch (inline rather than
>> attached).
>>
> 
> I think updating the linker script to put the page tables into a
> .pgdir section is reasonable, since it is part of the static footprint
> of the kernel.
> 
> However, I strongly feel that the Image header should remain the
> authoritative source of information regarding the nature (big/little
> endian, page size) and the static footprint of the Image.

I find `readelf -a | less` quite handy. Is there anything comparable for
the AArch64 Image format?

Please forgive my ignorance, but is the EFI stub another possible source
for sort of information?

Thanks,
Christopher Covington
Mark Rutland Dec. 2, 2015, 10:26 a.m. UTC | #5
On Tue, Dec 01, 2015 at 05:09:18PM -0500, Christopher Covington wrote:
> On 12/01/2015 06:52 AM, Ard Biesheuvel wrote:
> > However, I strongly feel that the Image header should remain the
> > authoritative source of information regarding the nature (big/little
> > endian, page size) and the static footprint of the Image.
> 
> I find `readelf -a | less` quite handy. Is there anything comparable for
> the AArch64 Image format?

Not that I am aware of. These days I just use od or hexedit, and parse
the header manually.

It's documented, so it would be possible to write one.

> Please forgive my ignorance, but is the EFI stub another possible source
> for sort of information?

Not really. The PE/COFF header for the EFI stub realistically only tells
you wether or not the kernel has an EFI stub. It shouldn't be used to
derive information about the kernel itself.

Thanks,
Mark.
Carl van Schaik Dec. 2, 2015, 1:46 p.m. UTC | #6
Hi,

On 1 December 2015 Ard Biesheuvel <ard.biesheuvel@linaro.org> wote:
> On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com>
> wrote:
> > Hi,
> >
> > On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote:
> >> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the
> idmap_pg_dir
> >> and swapper_pg_dir where moved from before the kernel to after it.
> >>
> >> The problem is that these symbols fall outside the range covered by
> >> the ELF file - outside of any section.
> >>
> >> A bootloader which loads the kernel ELF file and dynamically
> >> determines where to place the DTB, may try place it after the
> >> kernel. We've just run into this problem and the DTB gets
> >> overwritten as soon as the pagetables are created.
> 
> Could you explain why you are using the ELF file and not the binary image
> file?
> This is not future proof: currently, the Image is a straight binary
> objcopy of vmlinux, but that is not guaranteed to remain that way.
> Things like KASLR may require post build steps that mangle vmlinux or
> Image afterwards.

The reason we've been using ELF files is mostly to do with legacy virtualization
related reasons in our systems, we used to patch symbols in the ELFs for example
pre device-tree. However, since it hadn't caused problems until now we had
continued to use it. We haven't yet added Aarch64 Linux boot image header parsing
but it should be trivial.

The other area we are looking into is optimized multi-VM static boot images by
constructing hypervisor-bundle images containing de-duplicated Linux sections,
allowing an ELF bootloader to populate multiple Linux VMs from a smaller boot
image - resulting in faster boot.

> >
> > We had similar issues with the BSS when booting Image files prior to
> > this and commit a2c1d73b94ed49f5 ("arm64: Update the Image header").
> > Since then, the image_size field in the Image header tells you how much
> > memory the kernel may clobber (including the BSS and page tables).
> >
> > Prior to that, the page tables were below the kernel, and also not
> > described in any ELF section.
> >
> > Others booting the kernel vmlinux haven't reported similar issues, so I
> > assume that either they are parsing the Image header, or getting lucky.
> > Parsing the header is necessary to get the correct text offset, too...
> >
> > Pratyush, Geoff, I understood you were loading the kernel vmlinux for
> > kexec. Do you parse the Image header to figure out where to place
> > things?
> >
> >> I'd suggest that the kernel either:
> >>  A. document this boot requirement for where not to load a DTB
> >
> > Do you have any particular suggestion?
> >
> > We already describe the Image footprint (including BSS and page tables)
> > by the image_size in the Image header, which is sufficient. The size of
> > the BSS and page tables is effectively unbound, so we can't define some
> > upper bound that will always be true.
> >
> > The documentation is written on the assumption that an Image file is
> > being used rather than a vmlinux. Perhaps that is something to consider.
> >
> >>  B. update the vmlinux.lds.S such that these symbols (and _end) are
> >> properly covered by a section in the ELF, and thus preventing this
> >> issue.
> >
> > I'm worried that this only solves this one case, and it means that there
> > are two (potentially conflicting) sources of information that a
> > bootloader might be using -- the ELF or the Image header. I don't want
> > to have to duplicate text_offset and so on, which implies that parsing
> > the Image header is necessary anyway.
> >
> > That's something we can discuss if you send a patch (inline rather than
> > attached).
> >
> 
> I think updating the linker script to put the page tables into a
> .pgdir section is reasonable, since it is part of the static footprint
> of the kernel.

I agree

> However, I strongly feel that the Image header should remain the
> authoritative source of information regarding the nature (big/little
> endian, page size) and the static footprint of the Image .

Agreed, and there are other ways to de-duplicate which will still work
with binary image inputs.
Geoff Levand Dec. 2, 2015, 7:03 p.m. UTC | #7
Hi,

On Tue, 2015-12-01 at 11:02 +0000, Mark Rutland wrote:
> Pratyush, Geoff, I understood you were loading the kernel vmlinux for
> kexec. Do you parse the Image header to figure out where to place
> things?

Yes, in the kexec user tools we use text_offset to make
enough room for the kernel, but there is also the need for
page_offset.

We need to know the page_offset to be able to do virtual to
physical address conversions.  We can calculate the page_offset
for a vmlinux image as page_offset = phdr->p_vaddr - text_offset.

The binary Image currently has no info about page_offset or
virtual addressing.  We have a kexec-tools option for the
user to specify a page_offset.  If that option is not provided
we try to look at the running kernel's symbols, and if that
fails, fall back to a default page_offset.  This is less than
ideal, and certainly makes the binary Image less appealing to
use with kexec.

-Geoff
Mark Rutland Dec. 3, 2015, 12:24 p.m. UTC | #8
On Thu, Dec 03, 2015 at 12:46:35AM +1100, Carl van Schaik wrote:
> Hi,
> 
> On 1 December 2015 Ard Biesheuvel <ard.biesheuvel@linaro.org> wote:
> > On 1 December 2015 at 12:02, Mark Rutland <mark.rutland@arm.com>
> > wrote:
> > > Hi,
> > >
> > > On Mon, Nov 30, 2015 at 12:45:18PM +1100, Carl van Schaik wrote:
> > >> In commit bd00cd5f8c8c3c282bb1e1eac6a6679a4f808091, the
> > idmap_pg_dir
> > >> and swapper_pg_dir where moved from before the kernel to after it.
> > >>
> > >> The problem is that these symbols fall outside the range covered by
> > >> the ELF file - outside of any section.
> > >>
> > >> A bootloader which loads the kernel ELF file and dynamically
> > >> determines where to place the DTB, may try place it after the
> > >> kernel. We've just run into this problem and the DTB gets
> > >> overwritten as soon as the pagetables are created.
> > 
> > Could you explain why you are using the ELF file and not the binary image
> > file?
> > This is not future proof: currently, the Image is a straight binary
> > objcopy of vmlinux, but that is not guaranteed to remain that way.
> > Things like KASLR may require post build steps that mangle vmlinux or
> > Image afterwards.
> 
> The reason we've been using ELF files is mostly to do with legacy virtualization
> related reasons in our systems, we used to patch symbols in the ELFs for example
> pre device-tree. However, since it hadn't caused problems until now we had
> continued to use it. We haven't yet added Aarch64 Linux boot image header parsing
> but it should be trivial.
> 
> The other area we are looking into is optimized multi-VM static boot images by
> constructing hypervisor-bundle images containing de-duplicated Linux sections,
> allowing an ELF bootloader to populate multiple Linux VMs from a smaller boot
> image - resulting in faster boot.

Ok.

Per Ard's comments, this may get broken in future by KASLR or similar;
we cannot make strong guarantees as to the vmlinux being directly
usable. That's a different discussion, though...

> > >> I'd suggest that the kernel either:
> > >>  A. document this boot requirement for where not to load a DTB
> > >
> > > Do you have any particular suggestion?
> > >
> > > We already describe the Image footprint (including BSS and page tables)
> > > by the image_size in the Image header, which is sufficient. The size of
> > > the BSS and page tables is effectively unbound, so we can't define some
> > > upper bound that will always be true.
> > >
> > > The documentation is written on the assumption that an Image file is
> > > being used rather than a vmlinux. Perhaps that is something to consider.
> > >
> > >>  B. update the vmlinux.lds.S such that these symbols (and _end) are
> > >> properly covered by a section in the ELF, and thus preventing this
> > >> issue.
> > >
> > > I'm worried that this only solves this one case, and it means that there
> > > are two (potentially conflicting) sources of information that a
> > > bootloader might be using -- the ELF or the Image header. I don't want
> > > to have to duplicate text_offset and so on, which implies that parsing
> > > the Image header is necessary anyway.
> > >
> > > That's something we can discuss if you send a patch (inline rather than
> > > attached).
> > >
> > 
> > I think updating the linker script to put the page tables into a
> > .pgdir section is reasonable, since it is part of the static footprint
> > of the kernel.
> 
> I agree

Ok. As above, please send a standalone, inline patch for this. Please Cc
at least myself, Ard, and Catalin.

We can have any further discussion there.

> > However, I strongly feel that the Image header should remain the
> > authoritative source of information regarding the nature (big/little
> > endian, page size) and the static footprint of the Image .
> 
> Agreed, and there are other ways to de-duplicate which will still work
> with binary image inputs.

I completely agree that the Image is the canonical source of
information.

Thanks,
Mark.
Mark Rutland Dec. 3, 2015, 12:29 p.m. UTC | #9
On Wed, Dec 02, 2015 at 11:03:48AM -0800, Geoff Levand wrote:
> Hi,
> 
> On Tue, 2015-12-01 at 11:02 +0000, Mark Rutland wrote:
> > Pratyush, Geoff, I understood you were loading the kernel vmlinux for
> > kexec. Do you parse the Image header to figure out where to place
> > things?
> 
> Yes, in the kexec user tools we use text_offset to make
> enough room for the kernel, but there is also the need for
> page_offset.
> 
> We need to know the page_offset to be able to do virtual to
> physical address conversions.  We can calculate the page_offset
> for a vmlinux image as page_offset = phdr->p_vaddr - text_offset.

I don't understadn why you need to do that.

Is that just just so you can figure out where to load the segments
physically?

> The binary Image currently has no info about page_offset or
> virtual addressing.  We have a kexec-tools option for the
> user to specify a page_offset.  If that option is not provided
> we try to look at the running kernel's symbols, and if that
> fails, fall back to a default page_offset.  This is less than
> ideal, and certainly makes the binary Image less appealing to
> use with kexec.

I don't follow at all why this complication is necessary. I don't think
that the kexec tools should be looking at the kernel symbols in that
manner, and I don't think that we need to expose the page offset via the
Image header.

The first loaded address in the vmlinux corresponds to PHYS_OFFSET +
TEXT_OFFSET. If you know that, you can figure out an offset to apply to
VAs to convert them to PAs when loading.

What am I missing?

Thanks,
Mark.
diff mbox

Patch

From 25f5edabb5a719e24cc1865eb50b7894d0f976cf Mon Sep 17 00:00:00 2001
From: Carl van Schaik <carl@cog.systems>
Date: Mon, 30 Nov 2015 12:39:59 +1100
Subject: [PATCH] arm64: place initial page tables in ELF section

The swapper_pg_dir and idmap_pg_dir are placed above the kernel image,
after the BSS. Currently, these are outside of any section in the ELF,
which means a bootloader cannot from the ELF file alone determine where
to place data (DTB etc). It needs to reserve an arbitarty space.

This patch places the idmap_pd_dir and swapper_pg_dir symbols into a new
.pg_dir section. With this defined, the bootloader can safely place
images using addresses after the ELF.

Signed-off-by: Carl van Schaik <carl@cog.systems>
---
 arch/arm64/kernel/vmlinux.lds.S | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index a2c2986..759ae68 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -154,10 +154,12 @@  SECTIONS
 	BSS_SECTION(0, 0, 0)
 
 	. = ALIGN(PAGE_SIZE);
-	idmap_pg_dir = .;
-	. += IDMAP_DIR_SIZE;
-	swapper_pg_dir = .;
-	. += SWAPPER_DIR_SIZE;
+	.pg_dir : {
+		idmap_pg_dir = .;
+		. += IDMAP_DIR_SIZE;
+		swapper_pg_dir = .;
+		. += SWAPPER_DIR_SIZE;
+	}
 
 	_end = .;
 
-- 
2.3.2