[3/5] arm: decompressor: define a new zImage tag

Message ID	20200601142754.26139-4-l.stelmach@samsung.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=1A0W=7O=lists.infradead.org=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE38F207D5 From: =?utf-8?q?=C5=81ukasz_Stelmach?= <l.stelmach@samsung.com> To: Russell King <linux@armlinux.org.uk>, Masahiro Yamada <masahiroy@kernel.org>, Nick Desaulniers <ndesaulniers@google.com>, Thomas Gleixner <tglx@linutronix.de>, Enrico Weigelt <info@metux.net>, Kees Cook <keescook@chromium.org>, Ingo Molnar <mingo@kernel.org>, Ben Dooks <ben-linux@fluff.org>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/5] arm: decompressor: define a new zImage tag Date: Mon, 1 Jun 2020 16:27:52 +0200 Message-Id: <20200601142754.26139-4-l.stelmach@samsung.com> In-Reply-To: <20200601142754.26139-1-l.stelmach@samsung.com> MIME-Version: 1.0 Organization: Samsung R&D Institute Poland CMS-TYPE: 201P References: <20200601142754.26139-1-l.stelmach@samsung.com> <CGME20200601142810eucas1p1c42ff7c9b417f04bc506261726f08b4f@eucas1p1.samsung.com> summary: Content analysis details: (-5.2 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at https://www.dnswl.org/, high trust [210.118.77.11 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [210.118.77.11 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders -0.0 DKIMWL_WL_HIGH DKIMwl.org - Whitelisted High sender Precedence: list Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>, Marek Szyprowski <m.szyprowski@samsung.com>, =?utf-8?q?=C5=81ukasz_Stelmach?= <l.stelmach@samsung.com>, Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
Series	kexec_file_load() for arm \| expand [0/5] kexec_file_load() for arm [1/5] arm: decompressor: set malloc pool size for the decompressor [2/5] arm: add image header definitions [3/5] arm: decompressor: define a new zImage tag [4/5] arm: Add kexec_image_info [5/5] arm: kexec_file: load zImage or uImage, initrd and dtb

Lukasz Stelmach June 1, 2020, 2:27 p.m. UTC

Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
requirements of the decompressor code.

Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
---
 arch/arm/boot/compressed/vmlinux.lds.S |  9 ++++++++-
 arch/arm/include/asm/image.h           | 13 +++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

Russell King (Oracle) June 1, 2020, 2:55 p.m. UTC | #1

On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
> requirements of the decompressor code.

Why do we need to know the stack and BSS size, when the userspace
kexec tool doesn't need to know this to perform the same function.

> 
> Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
> ---
>  arch/arm/boot/compressed/vmlinux.lds.S |  9 ++++++++-
>  arch/arm/include/asm/image.h           | 13 +++++++++++++
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> index 308e9cd6a897..dcfdb3209c90 100644
> --- a/arch/arm/boot/compressed/vmlinux.lds.S
> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> @@ -39,6 +39,11 @@ SECTIONS
>      LONG(ARM_ZIMAGE_MAGIC3)
>      LONG(ZIMAGE_MAGIC(__piggy_size_addr - _start))
>      LONG(ZIMAGE_MAGIC(_kernel_bss_size))
> +    LONG(ZIMAGE_MAGIC(5))
> +    LONG(ARM_ZIMAGE_MAGIC4)
> +    LONG(ZIMAGE_MAGIC(_end - __bss_start))
> +    LONG(ZIMAGE_MAGIC(_stack_end - _stack_start))
> +    LONG(ZIMAGE_MAGIC(_malloc_size))
>      LONG(0)
>      _table_end = .;
>    }
> @@ -108,10 +113,12 @@ SECTIONS
>    . = BSS_START;
>    __bss_start = .;
>    .bss			: { *(.bss) }
> +  . = ALIGN(8);		/* the stack must be 64-bit aligned and adjoin bss */
>    _end = .;
>  
> -  . = ALIGN(8);		/* the stack must be 64-bit aligned */
> +  _stack_start = .;
>    .stack		: { *(.stack) }
> +  _stack_end = .;
>  
>    PROVIDE(__pecoff_data_size = ALIGN(512) - ADDR(.data));
>    PROVIDE(__pecoff_end = ALIGN(512));
> diff --git a/arch/arm/include/asm/image.h b/arch/arm/include/asm/image.h
> index d5c18a0f6a34..624438740f23 100644
> --- a/arch/arm/include/asm/image.h
> +++ b/arch/arm/include/asm/image.h
> @@ -15,6 +15,7 @@
>  #define ARM_ZIMAGE_MAGIC1 ZIMAGE_MAGIC(0x016f2818)
>  #define ARM_ZIMAGE_MAGIC2 (0x45454545)
>  #define ARM_ZIMAGE_MAGIC3 ZIMAGE_MAGIC(0x5a534c4b)
> +#define ARM_ZIMAGE_MAGIC4 ZIMAGE_MAGIC(0x5a534344)
>  
>  #ifndef __ASSEMBLY__
>  
> @@ -43,6 +44,18 @@ struct arm_zimage_tag {
>  	} u;
>  };
>  
> +struct arm_zimage_tag_dc {
> +	struct tag_header hdr;
> +	union {
> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
> +		struct zimage_decomp_size {
> +			__le32 bss_size;
> +			__le32 stack_size;
> +			__le32 malloc_size;
> +		} decomp_size;
> +	} u;
> +};
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __ASM_IMAGE_H */
> -- 
> 2.26.2
> 
>

Lukasz Stelmach June 1, 2020, 4:19 p.m. UTC | #2

It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
> On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
>> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
>> requirements of the decompressor code.
>
> Why do we need to know the stack and BSS size, when the userspace
> kexec tool doesn't need to know this to perform the same function.


kexec-tools load zImage as low in DRAM as possible and rely on two
assumptions:

    + the zImage will copy itself to make enough room for the kernel,
    + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
      of compression.

       DRAM start
       + 0x8000

zImage    |-----------|-----|-------|
            text+data   bss   stack 

                 text+data           bss   
kernel    |---------------------|-------------------|


initrd                                              |-initrd-|-dtb-|


My code on the other hand tries to load the zImage high enough to make
room for the kernel without copying the zImage.


            text+data           bss   
kernel    |-----------|-------------------|

zImage                |-----------|-----|-------|
                        text+data   bss   stack 

initrd                                          |-initrd-|-dtb-|


In such case the second assumption would be

    sizeof(zImage+mem) < sizeof(kernel bss)

and I can't tell for sure it would always be true. That is why we need
detailed information about decompressor memory requirements.


>> 
>> Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
>> ---
>>  arch/arm/boot/compressed/vmlinux.lds.S |  9 ++++++++-
>>  arch/arm/include/asm/image.h           | 13 +++++++++++++
>>  2 files changed, 21 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
>> index 308e9cd6a897..dcfdb3209c90 100644
>> --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> @@ -39,6 +39,11 @@ SECTIONS
>>      LONG(ARM_ZIMAGE_MAGIC3)
>>      LONG(ZIMAGE_MAGIC(__piggy_size_addr - _start))
>>      LONG(ZIMAGE_MAGIC(_kernel_bss_size))
>> +    LONG(ZIMAGE_MAGIC(5))
>> +    LONG(ARM_ZIMAGE_MAGIC4)
>> +    LONG(ZIMAGE_MAGIC(_end - __bss_start))
>> +    LONG(ZIMAGE_MAGIC(_stack_end - _stack_start))
>> +    LONG(ZIMAGE_MAGIC(_malloc_size))
>>      LONG(0)
>>      _table_end = .;
>>    }
>> @@ -108,10 +113,12 @@ SECTIONS
>>    . = BSS_START;
>>    __bss_start = .;
>>    .bss			: { *(.bss) }
>> +  . = ALIGN(8);		/* the stack must be 64-bit aligned and adjoin bss */
>>    _end = .;
>>  
>> -  . = ALIGN(8);		/* the stack must be 64-bit aligned */
>> +  _stack_start = .;
>>    .stack		: { *(.stack) }
>> +  _stack_end = .;
>>  
>>    PROVIDE(__pecoff_data_size = ALIGN(512) - ADDR(.data));
>>    PROVIDE(__pecoff_end = ALIGN(512));
>> diff --git a/arch/arm/include/asm/image.h b/arch/arm/include/asm/image.h
>> index d5c18a0f6a34..624438740f23 100644
>> --- a/arch/arm/include/asm/image.h
>> +++ b/arch/arm/include/asm/image.h
>> @@ -15,6 +15,7 @@
>>  #define ARM_ZIMAGE_MAGIC1 ZIMAGE_MAGIC(0x016f2818)
>>  #define ARM_ZIMAGE_MAGIC2 (0x45454545)
>>  #define ARM_ZIMAGE_MAGIC3 ZIMAGE_MAGIC(0x5a534c4b)
>> +#define ARM_ZIMAGE_MAGIC4 ZIMAGE_MAGIC(0x5a534344)
>>  
>>  #ifndef __ASSEMBLY__
>>  
>> @@ -43,6 +44,18 @@ struct arm_zimage_tag {
>>  	} u;
>>  };
>>  
>> +struct arm_zimage_tag_dc {
>> +	struct tag_header hdr;
>> +	union {
>> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
>> +		struct zimage_decomp_size {
>> +			__le32 bss_size;
>> +			__le32 stack_size;
>> +			__le32 malloc_size;
>> +		} decomp_size;
>> +	} u;
>> +};
>> +
>>  #endif /* __ASSEMBLY__ */
>>  
>>  #endif /* __ASM_IMAGE_H */
>> -- 
>> 2.26.2
>> 
>>

Russell King (Oracle) June 1, 2020, 6:25 p.m. UTC | #3

On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
> > On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
> >> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
> >> requirements of the decompressor code.
> >
> > Why do we need to know the stack and BSS size, when the userspace
> > kexec tool doesn't need to know this to perform the same function.
> 
> 
> kexec-tools load zImage as low in DRAM as possible and rely on two
> assumptions:
> 
>     + the zImage will copy itself to make enough room for the kernel,
>     + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
>       of compression.
> 
>        DRAM start
>        + 0x8000
> 
> zImage    |-----------|-----|-------|
>             text+data   bss   stack 
> 
>                  text+data           bss   
> kernel    |---------------------|-------------------|
> 
> 
> initrd                                              |-initrd-|-dtb-|

This is actually incorrect, because the decompressor will self-
relocate itself to avoid the area that it is going to decompress
into.  So, while the decompressor runs, in the above situation it
ends up as:

ram    |------------------------------------------------------...
                 text+data           bss   
kernel    |---------------------|-------------------|
zImage                          |-----------|-----|-------|
                                  text+data   bss   stack+malloc

Where "text+data" is actually smaller than the image size that
was loaded - the part of the image that performs the relocation
is discarded (the first chunk of code up to "restart" - 200
bytes.)  The BSS is typically smaller than 200 bytes, so we've
been able to get away without knowing the actual BSS size so
far.

ram    |--|-----------------------------------------|---------...
       |<>| TEXT_OFFSET
kernel    |---------------------|-------------------|
          |<----edata_size----->|<-----bss_size---->|
          |<---------------kernel_size------------->|
zImage                          |-----------|-----|-------|
                                |<-------len------->| (initial)
				|<----------len------------>| (final)

The "final" len value is what the decompressor prints as the "zImage
requires" debugging value.

Hence, the size that the decompressed kernel requires is kernel_size.

The size that the decompressor requires is edata_size + len(final).

Now, if you intend to load the kernel to ram + TEXT_OFFSET + edata_size
then it isn't going to lose the first 200 bytes of code, so as you
correctly point out, we need to know the BSS size.

> >> +struct arm_zimage_tag_dc {
> >> +	struct tag_header hdr;
> >> +	union {
> >> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
> >> +		struct zimage_decomp_size {
> >> +			__le32 bss_size;
> >> +			__le32 stack_size;
> >> +			__le32 malloc_size;
> >> +		} decomp_size;

You certainly don't need to know all this.  All you need to know is
how much space the decompressor requires after the end of the image,
encompassing the BSS size, stack size and malloc size, which is one
value.

Lukasz Stelmach June 1, 2020, 8:27 p.m. UTC | #4

It was <2020-06-01 pon 19:25>, when Russell King - ARM Linux admin wrote:
> On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
>> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
>> > On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
>> >> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
>> >> requirements of the decompressor code.
>> >
>> > Why do we need to know the stack and BSS size, when the userspace
>> > kexec tool doesn't need to know this to perform the same function.
>> 
>> 
>> kexec-tools load zImage as low in DRAM as possible and rely on two
>> assumptions:
>> 
>>     + the zImage will copy itself to make enough room for the kernel,
>>     + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
>>       of compression.
>> 
>>        DRAM start
>>        + 0x8000
>> 
>> zImage    |-----------|-----|-------|
>>             text+data   bss   stack 
>> 
>>                  text+data           bss   
>> kernel    |---------------------|-------------------|
>> 
>> 
>> initrd                                              |-initrd-|-dtb-|
>
> This is actually incorrect, because the decompressor will self-
> relocate itself to avoid the area that it is going to decompress
> into.

I described the state right after kexec(8) invocation.

> So, while the decompressor runs, in the above situation it
> ends up as:
>
>
> ram    |------------------------------------------------------...
>                  text+data           bss   
> kernel    |---------------------|-------------------|
> zImage                          |-----------|-----|-------|
>                                   text+data   bss   stack+malloc

And I am trying to achieve this state before the decompressor starts so
it won't need to copy itself during boot. The only exception is (as we
discussed under a different patch) when edata_size >= 128-eps MiB because
loading zImage above 128 MiB prevents it from properly detecting
physical memory. In such unlikely case my code behaves like kexec-tools
and loads zImage low. That is why I suggested that passing detailed
information about memory layout to the decompressor would help.

> Where "text+data" is actually smaller than the image size that
> was loaded - the part of the image that performs the relocation
> is discarded (the first chunk of code up to "restart" - 200
> bytes.)  The BSS is typically smaller than 200 bytes, so we've
> been able to get away without knowing the actual BSS size so
> far.
>
>
> ram    |--|-----------------------------------------|---------...
>        |<>| TEXT_OFFSET
> kernel    |---------------------|-------------------|
>           |<----edata_size----->|<-----bss_size---->|
>           |<---------------kernel_size------------->|
> zImage                          |-----------|-----|-------|
>                                 |<-------len------->| (initial)
> 				|<----------len------------>| (final)
>
> The "final" len value is what the decompressor prints as the "zImage
> requires" debugging value.
>
> Hence, the size that the decompressed kernel requires is kernel_size.
>
> The size that the decompressor requires is edata_size + len(final).
>
> Now, if you intend to load the kernel to ram + TEXT_OFFSET + edata_size
> then it isn't going to lose the first 200 bytes of code, so as you
> correctly point out, we need to know the BSS size.

Formal note: can we keep using terms zImage and kernel as separate,
where zImage is what is loaded with kexec and kernel is the decompressed
code loaded at TEXT_OFFSET. I believe, it will help us avoid mistakes.

>> >> +struct arm_zimage_tag_dc {
>> >> +	struct tag_header hdr;
>> >> +	union {
>> >> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
>> >> +		struct zimage_decomp_size {
>> >> +			__le32 bss_size;
>> >> +			__le32 stack_size;
>> >> +			__le32 malloc_size;
>> >> +		} decomp_size;
>
> You certainly don't need to know all this.  All you need to know is
> how much space the decompressor requires after the end of the image,
> encompassing the BSS size, stack size and malloc size, which is one
> value.

I agree. However, since we are not fighting here for every single byte,
I'd rather add them as separate values and make the tag, even if only
slightly, more future-proof.

Kind regards,

Russell King (Oracle) June 1, 2020, 8:41 p.m. UTC | #5

On Mon, Jun 01, 2020 at 10:27:45PM +0200, Lukasz Stelmach wrote:
> It was <2020-06-01 pon 19:25>, when Russell King - ARM Linux admin wrote:
> > On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
> >> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
> >> > On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
> >> >> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
> >> >> requirements of the decompressor code.
> >> >
> >> > Why do we need to know the stack and BSS size, when the userspace
> >> > kexec tool doesn't need to know this to perform the same function.
> >> 
> >> 
> >> kexec-tools load zImage as low in DRAM as possible and rely on two
> >> assumptions:
> >> 
> >>     + the zImage will copy itself to make enough room for the kernel,
> >>     + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
> >>       of compression.
> >> 
> >>        DRAM start
> >>        + 0x8000
> >> 
> >> zImage    |-----------|-----|-------|
> >>             text+data   bss   stack 
> >> 
> >>                  text+data           bss   
> >> kernel    |---------------------|-------------------|
> >> 
> >> 
> >> initrd                                              |-initrd-|-dtb-|
> >
> > This is actually incorrect, because the decompressor will self-
> > relocate itself to avoid the area that it is going to decompress
> > into.
> 
> I described the state right after kexec(8) invocation.

Actually, you haven't, because this is _not_ how kexec(8) lays it
out, as I attempted to detail further down in my reply.

> > So, while the decompressor runs, in the above situation it
> > ends up as:
> >
> >
> > ram    |------------------------------------------------------...
> >                  text+data           bss   
> > kernel    |---------------------|-------------------|
> > zImage                          |-----------|-----|-------|
> >                                   text+data   bss   stack+malloc

Note here - if the initrd was placed as _you_ describe at the end
of where the zImage ends up including its BSS, it would be
corrupted by the stack and malloc space of the decompressor while
running.  Ergo, your description of how kexec(8) lays stuff out
is incorrect.

> > Where "text+data" is actually smaller than the image size that
> > was loaded - the part of the image that performs the relocation
> > is discarded (the first chunk of code up to "restart" - 200
> > bytes.)  The BSS is typically smaller than 200 bytes, so we've
> > been able to get away without knowing the actual BSS size so
> > far.
> >
> >
> > ram    |--|-----------------------------------------|---------...
> >        |<>| TEXT_OFFSET
> > kernel    |---------------------|-------------------|
> >           |<----edata_size----->|<-----bss_size---->|
> >           |<---------------kernel_size------------->|
> > zImage                          |-----------|-----|-------|
> >                                 |<-------len------->| (initial)
> > 				|<----------len------------>| (final)
> >
> > The "final" len value is what the decompressor prints as the "zImage
> > requires" debugging value.
> >
> > Hence, the size that the decompressed kernel requires is kernel_size.
> >
> > The size that the decompressor requires is edata_size + len(final).
> >
> > Now, if you intend to load the kernel to ram + TEXT_OFFSET + edata_size
> > then it isn't going to lose the first 200 bytes of code, so as you
> > correctly point out, we need to know the BSS size.
> 
> Formal note: can we keep using terms zImage and kernel as separate,
> where zImage is what is loaded with kexec and kernel is the decompressed
> code loaded at TEXT_OFFSET. I believe, it will help us avoid mistakes.
> 
> >> >> +struct arm_zimage_tag_dc {
> >> >> +	struct tag_header hdr;
> >> >> +	union {
> >> >> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
> >> >> +		struct zimage_decomp_size {
> >> >> +			__le32 bss_size;
> >> >> +			__le32 stack_size;
> >> >> +			__le32 malloc_size;
> >> >> +		} decomp_size;
> >
> > You certainly don't need to know all this.  All you need to know is
> > how much space the decompressor requires after the end of the image,
> > encompassing the BSS size, stack size and malloc size, which is one
> > value.
> 
> I agree. However, since we are not fighting here for every single byte,
> I'd rather add them as separate values and make the tag, even if only
> slightly, more future-proof.

It doesn't make it more future-proof.  What happens if we add something
else, do we need to list it separately, and then change the kernel to
accept the new value and maybe also kexec(8)?  Or do we just say "the
decompressor needs X many bytes after the image" and be done with it?
The latter sounds way more future-proof to me.

Lukasz Stelmach June 2, 2020, 4:17 p.m. UTC | #6

It was <2020-06-01 pon 21:41>, when Russell King - ARM Linux admin wrote:
 > On Mon, Jun 01, 2020 at 10:27:45PM +0200, Lukasz Stelmach wrote:
>> It was <2020-06-01 pon 19:25>, when Russell King - ARM Linux admin wrote:
>>> On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
>>>> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
>>>>> On Mon, Jun 01, 2020 at 04:27:52PM +0200, Łukasz Stelmach wrote:
>>>>>> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
>>>>>> requirements of the decompressor code.
>>>>>
>>>>> Why do we need to know the stack and BSS size, when the userspace
>>>>> kexec tool doesn't need to know this to perform the same function.
>>>> 
>>>> 
>>>> kexec-tools load zImage as low in DRAM as possible and rely on two
>>>> assumptions:
>>>> 
>>>>     + the zImage will copy itself to make enough room for the kernel,
>>>>     + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
>>>>       of compression.
>>>> 
>>>>        DRAM start
>>>>        + 0x8000
>>>> 
>>>> zImage    |-----------|-----|-------|
>>>>             text+data   bss   stack 
>>>> 
>>>>                  text+data           bss   
>>>> kernel    |---------------------|-------------------|
>>>> 
>>>> 
>>>> initrd                                              |-initrd-|-dtb-|
>>>
>>> This is actually incorrect, because the decompressor will self-
>>> relocate itself to avoid the area that it is going to decompress
>>> into.
>> 
>> I described the state right after kexec(8) invocation.
>
> Actually, you haven't, because this is _not_ how kexec(8) lays it
> out, as I attempted to detail further down in my reply.


Let me try to describe how I understand the code in kexec-tools
(commit 74c7c369).

--8<---------------cut here---------------start------------->8---
int zImage_arm_load(…, const char *buf, off_t len, …)
// buf - zImage
// len - size of zImage
        
unsigned int extra_size = 0x8000; /* TEXT_OFFSET */

kernel_mem_size = len + 4;

// locate a hole to fit zImage + 32kB as low as possible,
base = locate_hole(info, len + extra_size, 0, 0, ULONG_MAX, INT_MAX);

kernel_base = base + extra_size;

add_segment(info, buf, len, kernel_base, kernel_mem_size);
--8<---------------cut here---------------end--------------->8---

Therefore, zImage is loaded low and always requires relocation.
        
ram     |--------------------------------------------------------------
zImage     |----k_m_s----|
           ^
           |
           kernel_base — TEXT_OFFSET or higher

Next goes initrd

--8<---------------cut here---------------start------------->8---
kexec_arm_image_size = len * 5; // or passed on command line

// if the tag exists
kexec_arm_image_size = max(edata_size + bss_size,
                           edata_size + len); // len - zImage size + 64 kB 

initrd_base = kernel_base + _ALIGN(kexec_arm_image_size, page_size);

add_segment(info, ramdisk_buf, initrd_size, initrd_base, initrd_size);
--8<---------------cut here---------------end--------------->8---

above whatever is bigger (kernel + kernel bss) OR (kernel + zImage + zImage mem).


ram     |---------------------------------------------------------------
zImage     |----k_m_s----|                   Where kexec loads zImage @kernel_base

           |.............len * 5....................| Fallback
kernel     |.....edata.....|...bss...|       These are just calculations
zImage                     |.....len+....|   zImage will copy itself here WHEN it runs

initrd                                   |--initrd_size--|
dtb                                      ^               |---|
                                         |
                                 initrd_base

DTB, of course, goes next

    dtb_offset = initrd_base + initrd_size + page_size;


Stuff marked with "-" is actually loaded, "." are just calculations to
establish initrd_base.

>>> So, while the decompressor runs, in the above situation it
>>> ends up as:
>>>
>>>
>>> ram    |------------------------------------------------------...
>>>                  text+data           bss   
>>> kernel    |---------------------|-------------------|
>>> zImage                          |-----------|-----|-------|
>>>                                   text+data   bss   stack+malloc
>
> Note here - if the initrd was placed as _you_ describe at the end
> of where the zImage ends up including its BSS, it would be
> corrupted by the stack and malloc space of the decompressor while
> running.  Ergo, your description of how kexec(8) lays stuff out
> is incorrect.

Is my analysis above accurate now? Do I understand this?

As you noted, my intention is to load zImage after edata (dotted len+
above).

>>>>>> +struct arm_zimage_tag_dc {
>>>>>> +	struct tag_header hdr;
>>>>>> +	union {
>>>>>> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
>>>>>> +		struct zimage_decomp_size {
>>>>>> +			__le32 bss_size;
>>>>>> +			__le32 stack_size;
>>>>>> +			__le32 malloc_size;
>>>>>> +		} decomp_size;
>>>
>>> You certainly don't need to know all this.  All you need to know is
>>> how much space the decompressor requires after the end of the image,
>>> encompassing the BSS size, stack size and malloc size, which is one
>>> value.
>> 
>> I agree. However, since we are not fighting here for every single byte,
>> I'd rather add them as separate values and make the tag, even if only
>> slightly, more future-proof.
>
> It doesn't make it more future-proof.  What happens if we add something
> else, do we need to list it separately, and then change the kernel to
> accept the new value and maybe also kexec(8)?  Or do we just say "the
> decompressor needs X many bytes after the image" and be done with it?
> The latter sounds way more future-proof to me.

You are right. I changed it to a single value. Done.

[3/5] arm: decompressor: define a new zImage tag

Commit Message

Comments

Patch