Compressed kernels currently won't boot
diff mbox series

Message ID 1564591443.3319.30.camel@HansenPartnership.com
State Superseded
Headers show
Series
  • Compressed kernels currently won't boot
Related show

Commit Message

James Bottomley July 31, 2019, 4:44 p.m. UTC
I noticed this trying to test out compressed kernel booting.  The
problem is that a compressed kernel is divided into two pieces, one of
which starts at 0x000e0000 and is the bootstrap code which is
uncompressed into 0x00100000 and the rest of which is the real
compressed kernel which is loaded above the end of the current
decompressed size of the entire kernel.  palo decompresses the head and
jumps to it and it then decompresses the rest of the kernel into place.
 This means that the first part of the compressed image can't be larger
than 0x20000 == 131072 because otherwise it will be loaded into an area
that decompression will alter.

The problem is that a change was introduced by 

commit 34c201ae49fe9e0bf3b389da5869d810f201c740
Author: Helge Deller <deller@gmx.de>
Date:   Mon Oct 15 22:14:01 2018 +0200

    parisc: Include compressed vmlinux file in vmlinuz boot kernel
 

Which moved the compressed vmlinux from the second segment to the
first, which is what makes it too big for me.  This patch reverting
that piece allows me to boot again.

James

---

Comments

Sven Schnelle July 31, 2019, 5:30 p.m. UTC | #1
Hi,

On Wed, Jul 31, 2019 at 09:44:03AM -0700, James Bottomley wrote:
> I noticed this trying to test out compressed kernel booting.  The
> problem is that a compressed kernel is divided into two pieces, one of
> which starts at 0x000e0000 and is the bootstrap code which is
> uncompressed into 0x00100000 and the rest of which is the real
> compressed kernel which is loaded above the end of the current
> decompressed size of the entire kernel.  palo decompresses the head and
> jumps to it and it then decompresses the rest of the kernel into place.
>  This means that the first part of the compressed image can't be larger
> than 0x20000 == 131072 because otherwise it will be loaded into an area
> that decompression will alter.
> 
> The problem is that a change was introduced by 
> 
> commit 34c201ae49fe9e0bf3b389da5869d810f201c740
> Author: Helge Deller <deller@gmx.de>
> Date:   Mon Oct 15 22:14:01 2018 +0200

Hmm. This is what i've been facing as well. After reading this commit i'm not
sure that the patch i've just sent ("parisc: strip debug information when
building compressed images") is really wanted. However, it is really a pain
to always copy huge lifimages around when booting parisc machines via LAN.
Does someone really extract the vmlinux file from a compressed kernel images?
Should we keep that?

Regards
Sven
James Bottomley July 31, 2019, 5:50 p.m. UTC | #2
On Wed, 2019-07-31 at 19:30 +0200, Sven Schnelle wrote:
> Hi,
> 
> On Wed, Jul 31, 2019 at 09:44:03AM -0700, James Bottomley wrote:
> > I noticed this trying to test out compressed kernel booting.  The
> > problem is that a compressed kernel is divided into two pieces, one
> > of which starts at 0x000e0000 and is the bootstrap code which is
> > uncompressed into 0x00100000 and the rest of which is the real
> > compressed kernel which is loaded above the end of the current
> > decompressed size of the entire kernel.  palo decompresses the head
> > and jumps to it and it then decompresses the rest of the kernel
> > into place.  This means that the first part of the compressed image
> > can't be larger than 0x20000 == 131072 because otherwise it will be
> > loaded into an area that decompression will alter.
> > 
> > The problem is that a change was introduced by 
> > 
> > commit 34c201ae49fe9e0bf3b389da5869d810f201c740
> > Author: Helge Deller <deller@gmx.de>
> > Date:   Mon Oct 15 22:14:01 2018 +0200
> 
> Hmm. This is what i've been facing as well.

Yes, except you're a more extreme case than me ... you actually have
the compressed segment overlapping the end of the decompressed text. 
that does seem to mean we have a lot of no-load debug information which
isn't useful to the compressed image.

>  After reading this commit i'm not sure that the patch i've just sent
> ("parisc: strip debug information when building compressed images")
> is really wanted. However, it is really a pain to always copy huge
> lifimages around when booting parisc machines via LAN. Does someone
> really extract the vmlinux file from a compressed kernel images?
> Should we keep that?

Well, it's a thing.  There's a script in the kernel source tree

scripts/extract-vmlinux

that does it.  It doesn't seem to be packaged by debian, though.

James
James Bottomley July 31, 2019, 7:40 p.m. UTC | #3
On Wed, 2019-07-31 at 10:50 -0700, James Bottomley wrote:
> On Wed, 2019-07-31 at 19:30 +0200, Sven Schnelle wrote:
> > Hi,
> > 
> > On Wed, Jul 31, 2019 at 09:44:03AM -0700, James Bottomley wrote:
> > > I noticed this trying to test out compressed kernel booting.  The
> > > problem is that a compressed kernel is divided into two pieces,
> > > one
> > > of which starts at 0x000e0000 and is the bootstrap code which is
> > > uncompressed into 0x00100000 and the rest of which is the real
> > > compressed kernel which is loaded above the end of the current
> > > decompressed size of the entire kernel.  palo decompresses the
> > > head
> > > and jumps to it and it then decompresses the rest of the kernel
> > > into place.  This means that the first part of the compressed
> > > image
> > > can't be larger than 0x20000 == 131072 because otherwise it will
> > > be
> > > loaded into an area that decompression will alter.
> > > 
> > > The problem is that a change was introduced by 
> > > 
> > > commit 34c201ae49fe9e0bf3b389da5869d810f201c740
> > > Author: Helge Deller <deller@gmx.de>
> > > Date:   Mon Oct 15 22:14:01 2018 +0200
> > 
> > Hmm. This is what i've been facing as well.
> 
> Yes, except you're a more extreme case than me ... you actually have
> the compressed segment overlapping the end of the decompressed text. 
> that does seem to mean we have a lot of no-load debug information
> which
> isn't useful to the compressed image.
> 
> >  After reading this commit i'm not sure that the patch i've just
> > sent ("parisc: strip debug information when building compressed
> > images") is really wanted. However, it is really a pain to always
> > copy huge lifimages around when booting parisc machines via LAN.
> > Does someone really extract the vmlinux file from a compressed
> > kernel images? Should we keep that?
> 
> Well, it's a thing.  There's a script in the kernel source tree
> 
> scripts/extract-vmlinux
> 
> that does it.  It doesn't seem to be packaged by debian, though.

What about causing the compressed make to build both a stripped and a
non-stripped bzImage (say sbzImage and bzImage).  That way you always
have the stripped one available for small size things like boot from
tape or DVD?  but in the usual case we use the bzImage with full
contents.

James
Sven Schnelle July 31, 2019, 7:44 p.m. UTC | #4
Hi James,

On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:

> What about causing the compressed make to build both a stripped and a
> non-stripped bzImage (say sbzImage and bzImage).  That way you always
> have the stripped one available for small size things like boot from
> tape or DVD?  but in the usual case we use the bzImage with full
> contents.

In that case we would also need to build two lifimages - how about adding
a config option option? Something like "Strip debug information from compressed
kernel images"?

Regards
Sven
Helge Deller July 31, 2019, 7:46 p.m. UTC | #5
On 31.07.19 21:44, Sven Schnelle wrote:
> Hi James,
>
> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
>
>> What about causing the compressed make to build both a stripped and a
>> non-stripped bzImage (say sbzImage and bzImage).  That way you always
>> have the stripped one available for small size things like boot from
>> tape or DVD?  but in the usual case we use the bzImage with full
>> contents.
>
> In that case we would also need to build two lifimages - how about adding
> a config option option? Something like "Strip debug information from compressed
> kernel images"?

I agree, two lifimages don't make sense. Only one vmlinuz gets installed.
Instead of the config option, I tink my latest patch is better.

Helge
James Bottomley July 31, 2019, 7:56 p.m. UTC | #6
On Wed, 2019-07-31 at 21:46 +0200, Helge Deller wrote:
> On 31.07.19 21:44, Sven Schnelle wrote:
> > Hi James,
> > 
> > On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
> > 
> > > What about causing the compressed make to build both a stripped
> > > and a non-stripped bzImage (say sbzImage and bzImage).  That way
> > > you always have the stripped one available for small size things
> > > like boot from tape or DVD?  but in the usual case we use the
> > > bzImage with full contents.
> > 
> > In that case we would also need to build two lifimages - how about
> > adding a config option option? Something like "Strip debug
> > information from compressed kernel images"?
> 
> I agree, two lifimages don't make sense. Only one vmlinuz gets
> installed. Instead of the config option, I tink my latest patch is
> better.

It doesn't solve the problem that if a stripped compressed image is >
128kb then it overwrites the decompress area starting at 0x00100000 so
we can't decompress the end because we've already overwritten it before
the decompressor gets to it.

What we could possibly do is be clever and align the .rodata.compressed
so its last text byte ends where the uncompressed kernel text would
end.  We could be even more clever and split .rodata.compressed into a
load and a noload part so we would only load the part of the compressed
kernel we need.  Then the lifimage creation scripts could discard the
noload part containing the debug symbols.

James
Helge Deller July 31, 2019, 7:57 p.m. UTC | #7
On 31.07.19 18:44, James Bottomley wrote:
> I noticed this trying to test out compressed kernel booting.  The
> problem is that a compressed kernel is divided into two pieces, one of
> which starts at 0x000e0000 and is the bootstrap code which is
> uncompressed into 0x00100000 and the rest of which is the real
> compressed kernel which is loaded above the end of the current
> decompressed size of the entire kernel.  palo decompresses the head and
> jumps to it and it then decompresses the rest of the kernel into place.
>   This means that the first part of the compressed image can't be larger
> than 0x20000 == 131072 because otherwise it will be loaded into an area
> that decompression will alter.
>
> The problem is that a change was introduced by
>
> commit 34c201ae49fe9e0bf3b389da5869d810f201c740
> Author: Helge Deller <deller@gmx.de>
> Date:   Mon Oct 15 22:14:01 2018 +0200
>
>      parisc: Include compressed vmlinux file in vmlinuz boot kernel
>
>
> Which moved the compressed vmlinux from the second segment to the
> first, which is what makes it too big for me.  This patch reverting
> that piece allows me to boot again.

There are two requirements:
1. Make sure not to use too much memory for "old" machines. Otherwise
you won't be able to boot a compressed kernel on e.g. a 16MB machine.

If you move the compressed data behind where the kernel would
self-extract itself, you double the amount of memory required.
I think with the patch below I won't be able to boot my 715/64
any longer.

2. Old palo versions had a bug which prevented the ELF loader
to load sections above 16MB. So, one needs to keep everything
thin in the low memory without extracting over oneself.

3. There might have been other reasons too, but currently I
don't remember :-)

I believe the the patch I sent for arch/parisc/boot/compressed/vmlinux.lds.S:
+       /* bootloader code and data starts at least behind area of extracted kernel */
+       . = MAX(ABSOLUTE(.), (SZ_end - SZparisc_kernel_start + KERNEL_BINARY_TEXT_START));
keeps everything bootable (on low-memory-machines and with palo ELF bootloader bug).

Helge

>
> diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S b/arch/parisc/boot/compressed/vmlinux.lds.S
> index bfd7872739a3..5841aa373c03 100644
> --- a/arch/parisc/boot/compressed/vmlinux.lds.S
> +++ b/arch/parisc/boot/compressed/vmlinux.lds.S
> @@ -42,12 +42,6 @@ SECTIONS
>   #endif
>   	_startcode_end = .;
>
> -	/* vmlinux.bin.gz is here */
> -	. = ALIGN(8);
> -	.rodata.compressed : {
> -		*(.rodata.compressed)
> -	}
> -
>   	/* bootloader code and data starts behind area of extracted kernel */
>   	. = (SZ_end - SZparisc_kernel_start + KERNEL_BINARY_TEXT_START);
>
> @@ -73,6 +67,12 @@ SECTIONS
>   		*(.rodata.*)
>   		_erodata = . ;
>   	}
> +	/* vmlinux.bin.gz is here */
> +	. = ALIGN(8);
> +	.rodata.compressed : {
> +		*(.rodata.compressed)
> +	}
> +
>   	. = ALIGN(8);
>   	.bss : {
>   		_bss = . ;
>
Helge Deller July 31, 2019, 8:19 p.m. UTC | #8
On 31.07.19 21:56, James Bottomley wrote:
> On Wed, 2019-07-31 at 21:46 +0200, Helge Deller wrote:
>> On 31.07.19 21:44, Sven Schnelle wrote:
>>> Hi James,
>>>
>>> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
>>>
>>>> What about causing the compressed make to build both a stripped
>>>> and a non-stripped bzImage (say sbzImage and bzImage).  That way
>>>> you always have the stripped one available for small size things
>>>> like boot from tape or DVD?  but in the usual case we use the
>>>> bzImage with full contents.
>>>
>>> In that case we would also need to build two lifimages - how about
>>> adding a config option option? Something like "Strip debug
>>> information from compressed kernel images"?
>>
>> I agree, two lifimages don't make sense. Only one vmlinuz gets
>> installed. Instead of the config option, I tink my latest patch is
>> better.
>
> It doesn't solve the problem that if a stripped compressed image is >
> 128kb then it overwrites the decompress area starting at 0x00100000 so
> we can't decompress the end because we've already overwritten it before
> the decompressor gets to it.

I don't get this point.
  hppa64-linux-gnu-objdump -h vmlinuz
shows:
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
   0 .head.text    00000084  00000000000e0000  00000000000e0000  00001000  2**2
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   1 .opd          00000340  00000000000e0090  00000000000e0090  00001090  2**3
                   CONTENTS, ALLOC, LOAD, DATA
   2 .dlt          00000160  00000000000e03d0  00000000000e03d0  000013d0  2**3
                   CONTENTS, ALLOC, LOAD, DATA
   3 .rodata.compressed 01f3c2b0  00000000000e0530  00000000000e0530  00001530  2**0
                   CONTENTS, ALLOC, LOAD, DATA
   4 .text         00005cc0  000000000201d000  000000000201d000  01f3e000  2**7
                   CONTENTS, ALLOC, LOAD, READONLY, CODE
   5 .data         00000060  0000000002022cc0  0000000002022cc0  01f43cc0  2**3
                   CONTENTS, ALLOC, LOAD, DATA

Only .head.text gets loaded at e0000, and it is basically just a few bytes which
sets-up registers and jump to .text segment (at 0201d000 in this case).
See: arch/parisc/boot/compressed/head.S
How should that get bigger than 128KB ?

Then the code in .text decompresses the whole kernel image behind itself
(behind "data").
Then the ELF loader moves the parts from the high-memory to the final
destination (e.g. 1000000).

The steps are:
1. palo loads vmlinuz into memory.
2. vmlinuz' head starts, and decompress_kernel() in arch/parisc/boot/compressed/misc.c
decompresses the vmlinuz file to a vmlinux file and stores it to
vmlinux_addr (which is behind the bss section of the boot decompressor).
3. Then the original kernel entry is started (arch/parisc/kernel/entry.S)
which moves the code to where it belongs and starts the kernel.

Helge

> What we could possibly do is be clever and align the .rodata.compressed
> so its last text byte ends where the uncompressed kernel text would
> end.  We could be even more clever and split .rodata.compressed into a
> load and a noload part so we would only load the part of the compressed
> kernel we need.  Then the lifimage creation scripts could discard the
> noload part containing the debug symbols.
>
> James
>
James Bottomley July 31, 2019, 8:49 p.m. UTC | #9
On Wed, 2019-07-31 at 22:19 +0200, Helge Deller wrote:
> On 31.07.19 21:56, James Bottomley wrote:
> > On Wed, 2019-07-31 at 21:46 +0200, Helge Deller wrote:
> > > On 31.07.19 21:44, Sven Schnelle wrote:
> > > > Hi James,
> > > > 
> > > > On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley
> > > > wrote:
> > > > 
> > > > > What about causing the compressed make to build both a
> > > > > stripped and a non-stripped bzImage (say sbzImage and
> > > > > bzImage).  That way you always have the stripped one
> > > > > available for small size things like boot from tape or
> > > > > DVD?  but in the usual case we use the bzImage with full
> > > > > contents.
> > > > 
> > > > In that case we would also need to build two lifimages - how
> > > > about adding a config option option? Something like "Strip
> > > > debug information from compressed kernel images"?
> > > 
> > > I agree, two lifimages don't make sense. Only one vmlinuz gets
> > > installed. Instead of the config option, I tink my latest patch
> > > is better.
> > 
> > It doesn't solve the problem that if a stripped compressed image is
> > >
> > 128kb then it overwrites the decompress area starting at 0x00100000
> > so we can't decompress the end because we've already overwritten it
> > before the decompressor gets to it.
> 
> I don't get this point.
>   hppa64-linux-gnu-objdump -h vmlinuz
> shows:
> Sections:
> Idx Name          Size      VMA               LMA               File
> off  Algn
>    0
> .head.text    00000084  00000000000e0000  00000000000e0000  00001000 
>  2**2
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>    1
> .opd          00000340  00000000000e0090  00000000000e0090  00001090 
>  2**3
>                    CONTENTS, ALLOC, LOAD, DATA
>    2
> .dlt          00000160  00000000000e03d0  00000000000e03d0  000013d0 
>  2**3
>                    CONTENTS, ALLOC, LOAD, DATA
>    3 .rodata.compressed
> 01f3c2b0  00000000000e0530  00000000000e0530  00001530  2**0
>                    CONTENTS, ALLOC, LOAD, DATA
>    4
> .text         00005cc0  000000000201d000  000000000201d000  01f3e000 
>  2**7
>                    CONTENTS, ALLOC, LOAD, READONLY, CODE
>    5
> .data         00000060  0000000002022cc0  0000000002022cc0  01f43cc0 
>  2**3
>                    CONTENTS, ALLOC, LOAD, DATA
> 
> Only .head.text gets loaded at e0000, and it is basically just a few
> bytes which sets-up registers and jump to .text segment (at 0201d000
> in this case).

Actually, you're looking at the wrong thing, you want to look at the
program header (the segments) not the section header.  It's the program
header we load.  If I extract this from the current debian kernel we
get 

jejb@ion:~/git/linux-build/arch/parisc/boot/compressed> readelf -l /boot/vmlinuz-4.19.0-5-parisc64-smp 

Elf file type is EXEC (Executable file)
Entry point 0xe0000
There are 4 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000001040 0x0000000000000000
                 0x00000000000000e0 0x00000000000000e0  R E    0x8
  LOAD           0x0000000000001000 0x00000000000e0000 0x00000000000e0000
                 0x00000000000004d8 0x00000000000004d8  RWE    0x1000
  LOAD           0x0000000000002000 0x0000000001400000 0x0000000001400000
                 0x00000000003dd46c 0x00000000003e1000  RWE    0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    0x10

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .head.text .opd .dlt 
   02     .text .data .rodata .eh_frame .bss 
   03     

The two LOAD sections corresponding to what PALO actually loads. The
problem happens if the length of the first load section is bigger than
0x20000. Now if you look what happens after your change:

jejb@ion:~/git/linux-build/build/parisc64/arch/parisc/boot> readelf -l bzImage 

Elf file type is EXEC (Executable file)
Entry point 0xe0000
There are 4 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000001040 0x0000000000000000
                 0x00000000000000e0 0x00000000000000e0  R E    0x8
  LOAD           0x0000000000001000 0x00000000000e0000 0x00000000000e0000
                 0x00000000004ae760 0x00000000004ae760  RWE    0x1000
  LOAD           0x00000000004b0000 0x000000000118a000 0x000000000118a000
                 0x0000000000006044 0x000000000000a000  RWE    0x1000
  GNU_STACK  0    0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    0x10

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .head.text .opd .dlt .rodata.compressed 
   02     .text .data .rodata .eh_frame .bss 
   03     

So the first section tries to load between 0x000e0000-0x0058e760 and
that's overwritten at 0x00100000 when the decompression starts because
0x00100000 is our KERNEL_BINARY_TEXT_START.  The result for me is that
I get the Decompressing linux ... message followed by a HPMC.

James
James Bottomley July 31, 2019, 9:01 p.m. UTC | #10
On Wed, 2019-07-31 at 21:44 +0200, Sven Schnelle wrote:
> Hi James,
> 
> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
> 
> > What about causing the compressed make to build both a stripped and
> > a non-stripped bzImage (say sbzImage and bzImage).  That way you
> > always have the stripped one available for small size things like
> > boot from tape or DVD?  but in the usual case we use the bzImage
> > with full contents.
> 
> In that case we would also need to build two lifimages - how about
> adding a config option option? Something like "Strip debug
> information from compressed kernel images"?

Actually, I just looked at what x86 does.  It has this in the
arch/x86/boot/compressed/Makefile:

OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
$(obj)/vmlinux.bin: vmlinux FORCE
	$(call if_changed,objcopy)

So it basically strips all the debug information from the kernel before
compressing, which argues there's no need to retain the information
because x86 doesn't bother.

James
Sven Schnelle July 31, 2019, 9:08 p.m. UTC | #11
Hi,

On Wed, Jul 31, 2019 at 02:01:34PM -0700, James Bottomley wrote:
> On Wed, 2019-07-31 at 21:44 +0200, Sven Schnelle wrote:
> > Hi James,
> > 
> > On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
> > 
> > > What about causing the compressed make to build both a stripped and
> > > a non-stripped bzImage (say sbzImage and bzImage).  That way you
> > > always have the stripped one available for small size things like
> > > boot from tape or DVD?  but in the usual case we use the bzImage
> > > with full contents.
> > 
> > In that case we would also need to build two lifimages - how about
> > adding a config option option? Something like "Strip debug
> > information from compressed kernel images"?
> 
> Actually, I just looked at what x86 does.  It has this in the
> arch/x86/boot/compressed/Makefile:
> 
> OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
> $(obj)/vmlinux.bin: vmlinux FORCE
> 	$(call if_changed,objcopy)
> 
> So it basically strips all the debug information from the kernel before
> compressing, which argues there's no need to retain the information
> because x86 doesn't bother.

Nice. So we could convince Helge by saying "Look, x86 is also stripping it"! :-)

Regards
Sven
Helge Deller July 31, 2019, 9:13 p.m. UTC | #12
On 31.07.19 23:08, Sven Schnelle wrote:
> Hi,
>
> On Wed, Jul 31, 2019 at 02:01:34PM -0700, James Bottomley wrote:
>> On Wed, 2019-07-31 at 21:44 +0200, Sven Schnelle wrote:
>>> Hi James,
>>>
>>> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
>>>
>>>> What about causing the compressed make to build both a stripped and
>>>> a non-stripped bzImage (say sbzImage and bzImage).  That way you
>>>> always have the stripped one available for small size things like
>>>> boot from tape or DVD?  but in the usual case we use the bzImage
>>>> with full contents.
>>>
>>> In that case we would also need to build two lifimages - how about
>>> adding a config option option? Something like "Strip debug
>>> information from compressed kernel images"?
>>
>> Actually, I just looked at what x86 does.  It has this in the
>> arch/x86/boot/compressed/Makefile:
>>
>> OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
>> $(obj)/vmlinux.bin: vmlinux FORCE
>> 	$(call if_changed,objcopy)
>>
>> So it basically strips all the debug information from the kernel before
>> compressing, which argues there's no need to retain the information
>> because x86 doesn't bother.
>
> Nice. So we could convince Helge by saying "Look, x86 is also stripping it"! :-)

I'm fine with doing exactly why x86 does :-)

Helge
Helge Deller July 31, 2019, 9:44 p.m. UTC | #13
On 31.07.19 22:49, James Bottomley wrote:
> On Wed, 2019-07-31 at 22:19 +0200, Helge Deller wrote:
>> On 31.07.19 21:56, James Bottomley wrote:
>>> On Wed, 2019-07-31 at 21:46 +0200, Helge Deller wrote:
>>>> On 31.07.19 21:44, Sven Schnelle wrote:
>>>>> Hi James,
>>>>>
>>>>> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley
>>>>> wrote:
>>>>>
>>>>>> What about causing the compressed make to build both a
>>>>>> stripped and a non-stripped bzImage (say sbzImage and
>>>>>> bzImage).  That way you always have the stripped one
>>>>>> available for small size things like boot from tape or
>>>>>> DVD?  but in the usual case we use the bzImage with full
>>>>>> contents.
>>>>>
>>>>> In that case we would also need to build two lifimages - how
>>>>> about adding a config option option? Something like "Strip
>>>>> debug information from compressed kernel images"?
>>>>
>>>> I agree, two lifimages don't make sense. Only one vmlinuz gets
>>>> installed. Instead of the config option, I tink my latest patch
>>>> is better.
>>>
>>> It doesn't solve the problem that if a stripped compressed image is
>>>>
>>> 128kb then it overwrites the decompress area starting at 0x00100000
>>> so we can't decompress the end because we've already overwritten it
>>> before the decompressor gets to it.
>>
>> I don't get this point.
>>    hppa64-linux-gnu-objdump -h vmlinuz
>> shows:
>> Sections:
>> Idx Name          Size      VMA               LMA               File
>> off  Algn
>>     0
>> .head.text    00000084  00000000000e0000  00000000000e0000  00001000
>>   2**2
>>                     CONTENTS, ALLOC, LOAD, READONLY, CODE
>>     1
>> .opd          00000340  00000000000e0090  00000000000e0090  00001090
>>   2**3
>>                     CONTENTS, ALLOC, LOAD, DATA
>>     2
>> .dlt          00000160  00000000000e03d0  00000000000e03d0  000013d0
>>   2**3
>>                     CONTENTS, ALLOC, LOAD, DATA
>>     3 .rodata.compressed
>> 01f3c2b0  00000000000e0530  00000000000e0530  00001530  2**0
>>                     CONTENTS, ALLOC, LOAD, DATA
>>     4
>> .text         00005cc0  000000000201d000  000000000201d000  01f3e000
>>   2**7
>>                     CONTENTS, ALLOC, LOAD, READONLY, CODE
>>     5
>> .data         00000060  0000000002022cc0  0000000002022cc0  01f43cc0
>>   2**3
>>                     CONTENTS, ALLOC, LOAD, DATA
>>
>> Only .head.text gets loaded at e0000, and it is basically just a few
>> bytes which sets-up registers and jump to .text segment (at 0201d000
>> in this case).
>
> Actually, you're looking at the wrong thing, you want to look at the
> program header (the segments) not the section header.  It's the program
> header we load.  If I extract this from the current debian kernel we
> get
>
> jejb@ion:~/git/linux-build/arch/parisc/boot/compressed> readelf -l /boot/vmlinuz-4.19.0-5-parisc64-smp
>
> Elf file type is EXEC (Executable file)
> Entry point 0xe0000
> There are 4 program headers, starting at offset 64
>
> Program Headers:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
>    PHDR           0x0000000000000040 0x0000000000001040 0x0000000000000000
>                   0x00000000000000e0 0x00000000000000e0  R E    0x8
>    LOAD           0x0000000000001000 0x00000000000e0000 0x00000000000e0000
>                   0x00000000000004d8 0x00000000000004d8  RWE    0x1000
>    LOAD           0x0000000000002000 0x0000000001400000 0x0000000001400000
>                   0x00000000003dd46c 0x00000000003e1000  RWE    0x1000
>    GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
>                   0x0000000000000000 0x0000000000000000  RWE    0x10
>
>   Section to Segment mapping:
>    Segment Sections...
>     00
>     01     .head.text .opd .dlt
>     02     .text .data .rodata .eh_frame .bss
>     03
>
> The two LOAD sections corresponding to what PALO actually loads. The
> problem happens if the length of the first load section is bigger than
> 0x20000.

What exactly is the problem if the first section is bigger than 0x20000?

> Now if you look what happens after your change:
> jejb@ion:~/git/linux-build/build/parisc64/arch/parisc/boot> readelf -l bzImage

Ok - bzImage is the same as ./vmlinuz.

> Elf file type is EXEC (Executable file)
> Entry point 0xe0000
> There are 4 program headers, starting at offset 64
>
> Program Headers:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
>    PHDR           0x0000000000000040 0x0000000000001040 0x0000000000000000
>                   0x00000000000000e0 0x00000000000000e0  R E    0x8
>    LOAD           0x0000000000001000 0x00000000000e0000 0x00000000000e0000
>                   0x00000000004ae760 0x00000000004ae760  RWE    0x1000
>    LOAD           0x00000000004b0000 0x000000000118a000 0x000000000118a000
>                   0x0000000000006044 0x000000000000a000  RWE    0x1000
>    GNU_STACK  0    0x0000000000000000 0x0000000000000000 0x0000000000000000
>                   0x0000000000000000 0x0000000000000000  RWE    0x10
>
>   Section to Segment mapping:
>    Segment Sections...
>     00
>     01     .head.text .opd .dlt .rodata.compressed
>     02     .text .data .rodata .eh_frame .bss
>     03
>
> So the first section tries to load between 0x000e0000-0x0058e760 and
> that's overwritten at 0x00100000 when the decompression starts because
> 0x00100000 is our KERNEL_BINARY_TEXT_START.

The decompression decompresses the image from .rodata.compressed
to an area behind .bss.
So, "vmlinux" ends up behind .bss for further processing.
This "vmlinux" (which can have multiple ELF sections) is then started at the high address.
That address is way above the 0x00100000 or KERNEL_BINARY_TEXT_START.
It then finally moves itself (the ELF sections) to 0x00100000.

> The result for me is that
> I get the Decompressing linux ... message followed by a HPMC.

It actually does boot for me and Sven without a HPMC.
The decompression is slow (~40 seconds on my c3000 for 160MB).
I still *believe* you are facing a HPMC because of other reasons.
On which machine do you start.
How much memory?

Helge
Helge Deller July 31, 2019, 9:51 p.m. UTC | #14
On 31.07.19 23:13, Helge Deller wrote:
> On 31.07.19 23:08, Sven Schnelle wrote:
>> Hi,
>>
>> On Wed, Jul 31, 2019 at 02:01:34PM -0700, James Bottomley wrote:
>>> On Wed, 2019-07-31 at 21:44 +0200, Sven Schnelle wrote:
>>>> Hi James,
>>>>
>>>> On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley wrote:
>>>>
>>>>> What about causing the compressed make to build both a stripped and
>>>>> a non-stripped bzImage (say sbzImage and bzImage).  That way you
>>>>> always have the stripped one available for small size things like
>>>>> boot from tape or DVD?  but in the usual case we use the bzImage
>>>>> with full contents.
>>>>
>>>> In that case we would also need to build two lifimages - how about
>>>> adding a config option option? Something like "Strip debug
>>>> information from compressed kernel images"?
>>>
>>> Actually, I just looked at what x86 does.  It has this in the
>>> arch/x86/boot/compressed/Makefile:
>>>
>>> OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
>>> $(obj)/vmlinux.bin: vmlinux FORCE
>>>     $(call if_changed,objcopy)
>>>
>>> So it basically strips all the debug information from the kernel before
>>> compressing, which argues there's no need to retain the information
>>> because x86 doesn't bother.
>>
>> Nice. So we could convince Helge by saying "Look, x86 is also stripping it"! :-)
>
> I'm fine with doing exactly why x86 does :-)

Attached is the revised patch, and it gets the compressed kernel down
from 32MB to 3.8MB.

Helge
James Bottomley Aug. 1, 2019, 1:37 a.m. UTC | #15
On Wed, 2019-07-31 at 23:44 +0200, Helge Deller wrote:
> On 31.07.19 22:49, James Bottomley wrote:
> > On Wed, 2019-07-31 at 22:19 +0200, Helge Deller wrote:
> > > On 31.07.19 21:56, James Bottomley wrote:
> > > > On Wed, 2019-07-31 at 21:46 +0200, Helge Deller wrote:
> > > > > On 31.07.19 21:44, Sven Schnelle wrote:
> > > > > > Hi James,
> > > > > > 
> > > > > > On Wed, Jul 31, 2019 at 12:40:12PM -0700, James Bottomley
> > > > > > wrote:
> > > > > > 
> > > > > > > What about causing the compressed make to build both a
> > > > > > > stripped and a non-stripped bzImage (say sbzImage and
> > > > > > > bzImage).  That way you always have the stripped one
> > > > > > > available for small size things like boot from tape or
> > > > > > > DVD?  but in the usual case we use the bzImage with full
> > > > > > > contents.
> > > > > > 
> > > > > > In that case we would also need to build two lifimages -
> > > > > > how
> > > > > > about adding a config option option? Something like "Strip
> > > > > > debug information from compressed kernel images"?
> > > > > 
> > > > > I agree, two lifimages don't make sense. Only one vmlinuz
> > > > > gets
> > > > > installed. Instead of the config option, I tink my latest
> > > > > patch
> > > > > is better.
> > > > 
> > > > It doesn't solve the problem that if a stripped compressed
> > > > image is
> > > > > 
> > > > 
> > > > 128kb then it overwrites the decompress area starting at
> > > > 0x00100000
> > > > so we can't decompress the end because we've already
> > > > overwritten it
> > > > before the decompressor gets to it.
> > > 
> > > I don't get this point.
> > >    hppa64-linux-gnu-objdump -h vmlinuz
> > > shows:
> > > Sections:
> > > Idx
> > > Name          Size      VMA               LMA               File
> > > off  Algn
> > >     0
> > > .head.text    00000084  00000000000e0000  00000000000e0000  00001
> > > 000
> > >   2**2
> > >                     CONTENTS, ALLOC, LOAD, READONLY, CODE
> > >     1
> > > .opd          00000340  00000000000e0090  00000000000e0090  00001
> > > 090
> > >   2**3
> > >                     CONTENTS, ALLOC, LOAD, DATA
> > >     2
> > > .dlt          00000160  00000000000e03d0  00000000000e03d0  00001
> > > 3d0
> > >   2**3
> > >                     CONTENTS, ALLOC, LOAD, DATA
> > >     3 .rodata.compressed
> > > 01f3c2b0  00000000000e0530  00000000000e0530  00001530  2**0
> > >                     CONTENTS, ALLOC, LOAD, DATA
> > >     4
> > > .text         00005cc0  000000000201d000  000000000201d000  01f3e
> > > 000
> > >   2**7
> > >                     CONTENTS, ALLOC, LOAD, READONLY, CODE
> > >     5
> > > .data         00000060  0000000002022cc0  0000000002022cc0  01f43
> > > cc0
> > >   2**3
> > >                     CONTENTS, ALLOC, LOAD, DATA
> > > 
> > > Only .head.text gets loaded at e0000, and it is basically just a
> > > few
> > > bytes which sets-up registers and jump to .text segment (at
> > > 0201d000
> > > in this case).
> > 
> > Actually, you're looking at the wrong thing, you want to look at
> > the
> > program header (the segments) not the section header.  It's the
> > program
> > header we load.  If I extract this from the current debian kernel
> > we
> > get
> > 
> > jejb@ion:~/git/linux-build/arch/parisc/boot/compressed> readelf -l
> > /boot/vmlinuz-4.19.0-5-parisc64-smp
> > 
> > Elf file type is EXEC (Executable file)
> > Entry point 0xe0000
> > There are 4 program headers, starting at offset 64
> > 
> > Program Headers:
> >    Type           Offset             VirtAddr           PhysAddr
> >                   FileSiz            MemSiz              Flags  Ali
> > gn
> >    PHDR           0x0000000000000040 0x0000000000001040
> > 0x0000000000000000
> >                   0x00000000000000e0 0x00000000000000e0  R E    0x8
> >    LOAD           0x0000000000001000 0x00000000000e0000
> > 0x00000000000e0000
> >                   0x00000000000004d8
> > 0x00000000000004d8  RWE    0x1000
> >    LOAD           0x0000000000002000 0x0000000001400000
> > 0x0000000001400000
> >                   0x00000000003dd46c
> > 0x00000000003e1000  RWE    0x1000
> >    GNU_STACK      0x0000000000000000 0x0000000000000000
> > 0x0000000000000000
> >                   0x0000000000000000
> > 0x0000000000000000  RWE    0x10
> > 
> >   Section to Segment mapping:
> >    Segment Sections...
> >     00
> >     01     .head.text .opd .dlt
> >     02     .text .data .rodata .eh_frame .bss
> >     03
> > 
> > The two LOAD sections corresponding to what PALO actually loads.
> > The
> > problem happens if the length of the first load section is bigger
> > than
> > 0x20000.
> 
> What exactly is the problem if the first section is bigger than
> 0x20000?
> 
> > Now if you look what happens after your change:
> > jejb@ion:~/git/linux-build/build/parisc64/arch/parisc/boot> readelf
> > -l bzImage
> 
> Ok - bzImage is the same as ./vmlinuz.
> 
> > Elf file type is EXEC (Executable file)
> > Entry point 0xe0000
> > There are 4 program headers, starting at offset 64
> > 
> > Program Headers:
> >    Type           Offset             VirtAddr           PhysAddr
> >                   FileSiz            MemSiz              Flags  Ali
> > gn
> >    PHDR           0x0000000000000040 0x0000000000001040
> > 0x0000000000000000
> >                   0x00000000000000e0 0x00000000000000e0  R E    0x8
> >    LOAD           0x0000000000001000 0x00000000000e0000
> > 0x00000000000e0000
> >                   0x00000000004ae760
> > 0x00000000004ae760  RWE    0x1000
> >    LOAD           0x00000000004b0000 0x000000000118a000
> > 0x000000000118a000
> >                   0x0000000000006044
> > 0x000000000000a000  RWE    0x1000
> >    GNU_STACK  0    0x0000000000000000 0x0000000000000000
> > 0x0000000000000000
> >                   0x0000000000000000
> > 0x0000000000000000  RWE    0x10
> > 
> >   Section to Segment mapping:
> >    Segment Sections...
> >     00
> >     01     .head.text .opd .dlt .rodata.compressed
> >     02     .text .data .rodata .eh_frame .bss
> >     03
> > 
> > So the first section tries to load between 0x000e0000-0x0058e760
> > and
> > that's overwritten at 0x00100000 when the decompression starts
> > because
> > 0x00100000 is our KERNEL_BINARY_TEXT_START.
> 
> The decompression decompresses the image from .rodata.compressed
> to an area behind .bss.
> So, "vmlinux" ends up behind .bss for further processing.
> This "vmlinux" (which can have multiple ELF sections) is then started
> at the high address.
> That address is way above the 0x00100000 or KERNEL_BINARY_TEXT_START.
> It then finally moves itself (the ELF sections) to 0x00100000.
> 
> > The result for me is that
> > I get the Decompressing linux ... message followed by a HPMC.
> 
> It actually does boot for me and Sven without a HPMC.
> The decompression is slow (~40 seconds on my c3000 for 160MB).
> I still *believe* you are facing a HPMC because of other reasons.
> On which machine do you start.
> How much memory?

This turned out to be a very eccentric bug.  Apparently we don't have
an archclean target in our arch/parisc/Makefile, so files in there
never get cleaned out by make mrproper.  This, in turn means that the
sizes.h file in arch/parisc/boot/compressed never gets removed and
worse, when you transition to an O=build/parisc[64] build model it
overrides the generated file.  The upshot being my bzImage was building
with a SZ_end that was too small.

I fixed it by making mrproper clean everyting.

James

---

diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 8acb8fa1f8d6..945952166468 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -182,5 +182,8 @@ define archhelp
 	@echo  '  zinstall	- Install compressed vmlinuz kernel'
 endef
 
+archclean:
+	$(Q)$(MAKE) $(clean)=$(boot)
+
 archheaders:
 	$(Q)$(MAKE) $(build)=arch/parisc/kernel/syscalls all
Sven Schnelle Aug. 1, 2019, 8:10 a.m. UTC | #16
Hi Helge,

On Wed, Jul 31, 2019 at 11:51:16PM +0200, Helge Deller wrote:
> 
> Attached is the revised patch, and it gets the compressed kernel down
> from 32MB to 3.8MB.
> 

Works for me, thanks!

Regards
Sven

Patch
diff mbox series

diff --git a/arch/parisc/boot/compressed/vmlinux.lds.S b/arch/parisc/boot/compressed/vmlinux.lds.S
index bfd7872739a3..5841aa373c03 100644
--- a/arch/parisc/boot/compressed/vmlinux.lds.S
+++ b/arch/parisc/boot/compressed/vmlinux.lds.S
@@ -42,12 +42,6 @@  SECTIONS
 #endif
 	_startcode_end = .;
 
-	/* vmlinux.bin.gz is here */
-	. = ALIGN(8);
-	.rodata.compressed : {
-		*(.rodata.compressed)
-	}
-
 	/* bootloader code and data starts behind area of extracted kernel */
 	. = (SZ_end - SZparisc_kernel_start + KERNEL_BINARY_TEXT_START);
 
@@ -73,6 +67,12 @@  SECTIONS
 		*(.rodata.*)
 		_erodata = . ;
 	}
+	/* vmlinux.bin.gz is here */
+	. = ALIGN(8);
+	.rodata.compressed : {
+		*(.rodata.compressed)
+	}
+
 	. = ALIGN(8);
 	.bss : {
 		_bss = . ;