[3/4] ARM: merge -fdata-sections BSS data to .bss section
diff mbox series

Message ID E1gK1ae-0000or-7R@rmk-PC.armlinux.org.uk
State New
Headers show
Series
  • Enable deadcode elimination at link time
Related show

Commit Message

Russell King Nov. 6, 2018, 1:40 p.m. UTC
When building with -fdata-sections, the BSS data is placed into separate
sections, so we need to ask the linker to group these together into the
.bss output section, so that they're correctly placed.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ard Biesheuvel Nov. 6, 2018, 2:08 p.m. UTC | #1
On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> When building with -fdata-sections, the BSS data is placed into separate
> sections, so we need to ask the linker to group these together into the
> .bss output section, so that they're correctly placed.
>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
>  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> index 2b963d8e76dd..3b91bc3c606f 100644
> --- a/arch/arm/boot/compressed/vmlinux.lds.S
> +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> @@ -118,7 +118,7 @@ SECTIONS
>
>    . = BSS_START;
>    __bss_start = .;
> -  .bss                 : { *(.bss) }
> +  .bss                 : { *(.bss) *(.bss.*) }

Would it make sense to sort these by alignment? Otherwise, I suspect
you may get a lot of padding holes due to the number of different
input sections, each with its own alignment.

>    _end = .;
>
>    . = ALIGN(8);                /* the stack must be 64-bit aligned */
> --
> 2.7.4
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King - ARM Linux admin Nov. 6, 2018, 2:10 p.m. UTC | #2
On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> > When building with -fdata-sections, the BSS data is placed into separate
> > sections, so we need to ask the linker to group these together into the
> > .bss output section, so that they're correctly placed.
> >
> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> > ---
> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> > index 2b963d8e76dd..3b91bc3c606f 100644
> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> > @@ -118,7 +118,7 @@ SECTIONS
> >
> >    . = BSS_START;
> >    __bss_start = .;
> > -  .bss                 : { *(.bss) }
> > +  .bss                 : { *(.bss) *(.bss.*) }
> 
> Would it make sense to sort these by alignment? Otherwise, I suspect
> you may get a lot of padding holes due to the number of different
> input sections, each with its own alignment.

We don't bother elsewhere in the kernel linker script - do you have
a case where we get lots of padding?
Ard Biesheuvel Nov. 6, 2018, 2:13 p.m. UTC | #3
On 6 November 2018 at 15:10, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
>> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
>> > When building with -fdata-sections, the BSS data is placed into separate
>> > sections, so we need to ask the linker to group these together into the
>> > .bss output section, so that they're correctly placed.
>> >
>> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>> > ---
>> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
>> > index 2b963d8e76dd..3b91bc3c606f 100644
>> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> > @@ -118,7 +118,7 @@ SECTIONS
>> >
>> >    . = BSS_START;
>> >    __bss_start = .;
>> > -  .bss                 : { *(.bss) }
>> > +  .bss                 : { *(.bss) *(.bss.*) }
>>
>> Would it make sense to sort these by alignment? Otherwise, I suspect
>> you may get a lot of padding holes due to the number of different
>> input sections, each with its own alignment.
>
> We don't bother elsewhere in the kernel linker script - do you have
> a case where we get lots of padding?
>

With -fdata-sections? How else are we ensuring that each .bss item in
its own section is not placed such that its alignment results in a
padding hole?

WIthout -fdata-sections, this is done per object, and GCC takes care
of this itself.
Russell King - ARM Linux admin Nov. 6, 2018, 2:29 p.m. UTC | #4
On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
> On 6 November 2018 at 15:10, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >> > When building with -fdata-sections, the BSS data is placed into separate
> >> > sections, so we need to ask the linker to group these together into the
> >> > .bss output section, so that they're correctly placed.
> >> >
> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >> > ---
> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> >> > index 2b963d8e76dd..3b91bc3c606f 100644
> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> >> > @@ -118,7 +118,7 @@ SECTIONS
> >> >
> >> >    . = BSS_START;
> >> >    __bss_start = .;
> >> > -  .bss                 : { *(.bss) }
> >> > +  .bss                 : { *(.bss) *(.bss.*) }
> >>
> >> Would it make sense to sort these by alignment? Otherwise, I suspect
> >> you may get a lot of padding holes due to the number of different
> >> input sections, each with its own alignment.
> >
> > We don't bother elsewhere in the kernel linker script - do you have
> > a case where we get lots of padding?
> >
> 
> With -fdata-sections? How else are we ensuring that each .bss item in
> its own section is not placed such that its alignment results in a
> padding hole?

Quite simply, we don't care (at the moment).  The alignment does not
come from the size of each individual section, but the alignment
requirements of data within the section.

See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
symbols from around line 60.
Ard Biesheuvel Nov. 6, 2018, 2:39 p.m. UTC | #5
On 6 November 2018 at 15:29, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
>> On 6 November 2018 at 15:10, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
>> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
>> >> > When building with -fdata-sections, the BSS data is placed into separate
>> >> > sections, so we need to ask the linker to group these together into the
>> >> > .bss output section, so that they're correctly placed.
>> >> >
>> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>> >> > ---
>> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> > index 2b963d8e76dd..3b91bc3c606f 100644
>> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> > @@ -118,7 +118,7 @@ SECTIONS
>> >> >
>> >> >    . = BSS_START;
>> >> >    __bss_start = .;
>> >> > -  .bss                 : { *(.bss) }
>> >> > +  .bss                 : { *(.bss) *(.bss.*) }
>> >>
>> >> Would it make sense to sort these by alignment? Otherwise, I suspect
>> >> you may get a lot of padding holes due to the number of different
>> >> input sections, each with its own alignment.
>> >
>> > We don't bother elsewhere in the kernel linker script - do you have
>> > a case where we get lots of padding?
>> >
>>
>> With -fdata-sections? How else are we ensuring that each .bss item in
>> its own section is not placed such that its alignment results in a
>> padding hole?
>
> Quite simply, we don't care (at the moment).  The alignment does not
> come from the size of each individual section, but the alignment
> requirements of data within the section.
>
> See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
> symbols from around line 60.
>

This seems like an oversight to me: consider something like*

unsigned char bar = 1;
unsigned long long foo = 4;

build it with

arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c

and we end up with the following object file

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  1
  [ 4] .data.bar         PROGBITS        00000000 000034 000001 00  WA  0   0  1
  [ 5] .data.foo         PROGBITS        00000000 000038 000008 00  WA  0   0  8
  ...

Symbol table '.symtab' contains 13 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
    11: 00000000     8 OBJECT  GLOBAL DEFAULT    5 foo
    12: 00000000     1 OBJECT  GLOBAL DEFAULT    4 bar

whereas if I remove the -fdata-sections argument, I get

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        00000000 000038 000009 00  WA  0   0  8
  [ 3] .bss              NOBITS          00000000 000041 000000 00  WA  0   0  1
  ...

and

Symbol table '.symtab' contains 11 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
   ...
     9: 00000000     8 OBJECT  GLOBAL DEFAULT    2 foo
    10: 00000008     1 OBJECT  GLOBAL DEFAULT    2 bar

In other words, we end up with a 7 byte padding hole by switching to
-fdata-sections, unless we sort the input objects by alignment.

* for my test compiile, -fdata-sections would not produce different
input section so I am using .data for this example.
Russell King - ARM Linux admin Nov. 6, 2018, 2:45 p.m. UTC | #6
On Tue, Nov 06, 2018 at 03:39:58PM +0100, Ard Biesheuvel wrote:
> On 6 November 2018 at 15:29, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
> >> On 6 November 2018 at 15:10, Russell King - ARM Linux
> >> <linux@armlinux.org.uk> wrote:
> >> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
> >> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >> >> > When building with -fdata-sections, the BSS data is placed into separate
> >> >> > sections, so we need to ask the linker to group these together into the
> >> >> > .bss output section, so that they're correctly placed.
> >> >> >
> >> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >> >> > ---
> >> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
> >> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >> >
> >> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> > index 2b963d8e76dd..3b91bc3c606f 100644
> >> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> > @@ -118,7 +118,7 @@ SECTIONS
> >> >> >
> >> >> >    . = BSS_START;
> >> >> >    __bss_start = .;
> >> >> > -  .bss                 : { *(.bss) }
> >> >> > +  .bss                 : { *(.bss) *(.bss.*) }
> >> >>
> >> >> Would it make sense to sort these by alignment? Otherwise, I suspect
> >> >> you may get a lot of padding holes due to the number of different
> >> >> input sections, each with its own alignment.
> >> >
> >> > We don't bother elsewhere in the kernel linker script - do you have
> >> > a case where we get lots of padding?
> >> >
> >>
> >> With -fdata-sections? How else are we ensuring that each .bss item in
> >> its own section is not placed such that its alignment results in a
> >> padding hole?
> >
> > Quite simply, we don't care (at the moment).  The alignment does not
> > come from the size of each individual section, but the alignment
> > requirements of data within the section.
> >
> > See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
> > symbols from around line 60.
> >
> 
> This seems like an oversight to me: consider something like*
> 
> unsigned char bar = 1;
> unsigned long long foo = 4;
> 
> build it with
> 
> arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c
> 
> and we end up with the following object file
> 
> Section Headers:
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>   [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  1
>   [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  1
>   [ 4] .data.bar         PROGBITS        00000000 000034 000001 00  WA  0   0  1
>   [ 5] .data.foo         PROGBITS        00000000 000038 000008 00  WA  0   0  8
>   ...
> 
> Symbol table '.symtab' contains 13 entries:
>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>     11: 00000000     8 OBJECT  GLOBAL DEFAULT    5 foo
>     12: 00000000     1 OBJECT  GLOBAL DEFAULT    4 bar
> 
> whereas if I remove the -fdata-sections argument, I get
> 
> Section Headers:
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>   [ 2] .data             PROGBITS        00000000 000038 000009 00  WA  0   0  8
>   [ 3] .bss              NOBITS          00000000 000041 000000 00  WA  0   0  1
>   ...
> 
> and
> 
> Symbol table '.symtab' contains 11 entries:
>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>    ...
>      9: 00000000     8 OBJECT  GLOBAL DEFAULT    2 foo
>     10: 00000008     1 OBJECT  GLOBAL DEFAULT    2 bar
> 
> In other words, we end up with a 7 byte padding hole by switching to
> -fdata-sections, unless we sort the input objects by alignment.
> 
> * for my test compiile, -fdata-sections would not produce different
> input section so I am using .data for this example.

Nevertheless, changing this is not a subject for a patch series that
targets ARM.  What you've identified is a deficiency in the generic
cross-arch support for this which needs a wider audience to be
involved.
Ard Biesheuvel Nov. 6, 2018, 3:18 p.m. UTC | #7
On 6 November 2018 at 15:45, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Nov 06, 2018 at 03:39:58PM +0100, Ard Biesheuvel wrote:
>> On 6 November 2018 at 15:29, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>> > On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
>> >> On 6 November 2018 at 15:10, Russell King - ARM Linux
>> >> <linux@armlinux.org.uk> wrote:
>> >> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
>> >> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
>> >> >> > When building with -fdata-sections, the BSS data is placed into separate
>> >> >> > sections, so we need to ask the linker to group these together into the
>> >> >> > .bss output section, so that they're correctly placed.
>> >> >> >
>> >> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>> >> >> > ---
>> >> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
>> >> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >> >
>> >> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> > index 2b963d8e76dd..3b91bc3c606f 100644
>> >> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> > @@ -118,7 +118,7 @@ SECTIONS
>> >> >> >
>> >> >> >    . = BSS_START;
>> >> >> >    __bss_start = .;
>> >> >> > -  .bss                 : { *(.bss) }
>> >> >> > +  .bss                 : { *(.bss) *(.bss.*) }
>> >> >>
>> >> >> Would it make sense to sort these by alignment? Otherwise, I suspect
>> >> >> you may get a lot of padding holes due to the number of different
>> >> >> input sections, each with its own alignment.
>> >> >
>> >> > We don't bother elsewhere in the kernel linker script - do you have
>> >> > a case where we get lots of padding?
>> >> >
>> >>
>> >> With -fdata-sections? How else are we ensuring that each .bss item in
>> >> its own section is not placed such that its alignment results in a
>> >> padding hole?
>> >
>> > Quite simply, we don't care (at the moment).  The alignment does not
>> > come from the size of each individual section, but the alignment
>> > requirements of data within the section.
>> >
>> > See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
>> > symbols from around line 60.
>> >
>>
>> This seems like an oversight to me: consider something like*
>>
>> unsigned char bar = 1;
>> unsigned long long foo = 4;
>>
>> build it with
>>
>> arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c
>>
>> and we end up with the following object file
>>
>> Section Headers:
>>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>>   [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  1
>>   [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  1
>>   [ 4] .data.bar         PROGBITS        00000000 000034 000001 00  WA  0   0  1
>>   [ 5] .data.foo         PROGBITS        00000000 000038 000008 00  WA  0   0  8
>>   ...
>>
>> Symbol table '.symtab' contains 13 entries:
>>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>>     11: 00000000     8 OBJECT  GLOBAL DEFAULT    5 foo
>>     12: 00000000     1 OBJECT  GLOBAL DEFAULT    4 bar
>>
>> whereas if I remove the -fdata-sections argument, I get
>>
>> Section Headers:
>>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>>   [ 2] .data             PROGBITS        00000000 000038 000009 00  WA  0   0  8
>>   [ 3] .bss              NOBITS          00000000 000041 000000 00  WA  0   0  1
>>   ...
>>
>> and
>>
>> Symbol table '.symtab' contains 11 entries:
>>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>>    ...
>>      9: 00000000     8 OBJECT  GLOBAL DEFAULT    2 foo
>>     10: 00000008     1 OBJECT  GLOBAL DEFAULT    2 bar
>>
>> In other words, we end up with a 7 byte padding hole by switching to
>> -fdata-sections, unless we sort the input objects by alignment.
>>
>> * for my test compiile, -fdata-sections would not produce different
>> input section so I am using .data for this example.
>
> Nevertheless, changing this is not a subject for a patch series that
> targets ARM.  What you've identified is a deficiency in the generic
> cross-arch support for this which needs a wider audience to be
> involved.
>

Fair enough.

For the record, when I build the arm64 defconfig kernel with
LD_DEAD_CODE_ELIMINATION turned on, adding --sort-section=alignment
reduces the image size from

   text    data     bss     dec     hex filename
11951684 6936332 385040 19273056 1261560 vmlinux

to

   text    data     bss     dec     hex filename
11938452 6930404 383625 19252481 125c501 vmlinux
Russell King - ARM Linux admin Nov. 6, 2018, 4:06 p.m. UTC | #8
On Tue, Nov 06, 2018 at 04:18:46PM +0100, Ard Biesheuvel wrote:
> On 6 November 2018 at 15:45, Russell King - ARM Linux
> <linux@armlinux.org.uk> wrote:
> > On Tue, Nov 06, 2018 at 03:39:58PM +0100, Ard Biesheuvel wrote:
> >> On 6 November 2018 at 15:29, Russell King - ARM Linux
> >> <linux@armlinux.org.uk> wrote:
> >> > On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
> >> >> On 6 November 2018 at 15:10, Russell King - ARM Linux
> >> >> <linux@armlinux.org.uk> wrote:
> >> >> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
> >> >> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
> >> >> >> > When building with -fdata-sections, the BSS data is placed into separate
> >> >> >> > sections, so we need to ask the linker to group these together into the
> >> >> >> > .bss output section, so that they're correctly placed.
> >> >> >> >
> >> >> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> >> >> >> > ---
> >> >> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
> >> >> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >> >> >
> >> >> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> >> > index 2b963d8e76dd..3b91bc3c606f 100644
> >> >> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
> >> >> >> > @@ -118,7 +118,7 @@ SECTIONS
> >> >> >> >
> >> >> >> >    . = BSS_START;
> >> >> >> >    __bss_start = .;
> >> >> >> > -  .bss                 : { *(.bss) }
> >> >> >> > +  .bss                 : { *(.bss) *(.bss.*) }
> >> >> >>
> >> >> >> Would it make sense to sort these by alignment? Otherwise, I suspect
> >> >> >> you may get a lot of padding holes due to the number of different
> >> >> >> input sections, each with its own alignment.
> >> >> >
> >> >> > We don't bother elsewhere in the kernel linker script - do you have
> >> >> > a case where we get lots of padding?
> >> >> >
> >> >>
> >> >> With -fdata-sections? How else are we ensuring that each .bss item in
> >> >> its own section is not placed such that its alignment results in a
> >> >> padding hole?
> >> >
> >> > Quite simply, we don't care (at the moment).  The alignment does not
> >> > come from the size of each individual section, but the alignment
> >> > requirements of data within the section.
> >> >
> >> > See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
> >> > symbols from around line 60.
> >> >
> >>
> >> This seems like an oversight to me: consider something like*
> >>
> >> unsigned char bar = 1;
> >> unsigned long long foo = 4;
> >>
> >> build it with
> >>
> >> arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c
> >>
> >> and we end up with the following object file
> >>
> >> Section Headers:
> >>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> >>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
> >>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
> >>   [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  1
> >>   [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  1
> >>   [ 4] .data.bar         PROGBITS        00000000 000034 000001 00  WA  0   0  1
> >>   [ 5] .data.foo         PROGBITS        00000000 000038 000008 00  WA  0   0  8
> >>   ...
> >>
> >> Symbol table '.symtab' contains 13 entries:
> >>    Num:    Value  Size Type    Bind   Vis      Ndx Name
> >>     11: 00000000     8 OBJECT  GLOBAL DEFAULT    5 foo
> >>     12: 00000000     1 OBJECT  GLOBAL DEFAULT    4 bar
> >>
> >> whereas if I remove the -fdata-sections argument, I get
> >>
> >> Section Headers:
> >>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> >>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
> >>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
> >>   [ 2] .data             PROGBITS        00000000 000038 000009 00  WA  0   0  8
> >>   [ 3] .bss              NOBITS          00000000 000041 000000 00  WA  0   0  1
> >>   ...
> >>
> >> and
> >>
> >> Symbol table '.symtab' contains 11 entries:
> >>    Num:    Value  Size Type    Bind   Vis      Ndx Name
> >>    ...
> >>      9: 00000000     8 OBJECT  GLOBAL DEFAULT    2 foo
> >>     10: 00000008     1 OBJECT  GLOBAL DEFAULT    2 bar
> >>
> >> In other words, we end up with a 7 byte padding hole by switching to
> >> -fdata-sections, unless we sort the input objects by alignment.
> >>
> >> * for my test compiile, -fdata-sections would not produce different
> >> input section so I am using .data for this example.
> >
> > Nevertheless, changing this is not a subject for a patch series that
> > targets ARM.  What you've identified is a deficiency in the generic
> > cross-arch support for this which needs a wider audience to be
> > involved.
> >
> 
> Fair enough.
> 
> For the record, when I build the arm64 defconfig kernel with
> LD_DEAD_CODE_ELIMINATION turned on, adding --sort-section=alignment
> reduces the image size from
> 
>    text    data     bss     dec     hex filename
> 11951684 6936332 385040 19273056 1261560 vmlinux
> 
> to
> 
>    text    data     bss     dec     hex filename
> 11938452 6930404 383625 19252481 125c501 vmlinux

I'm not sure that's relevant when this thread is about 32bit ARM.
It's a little like building an x86 kernel and posting the results.
Two entirely different architectures with different results.

Here's the numbers for an imx6 ARM kernel:

   text    data     bss      dec     hex filename
8438980 3255060 9640528 21334568 1458a28 imx6-unpatched/vmlinux
6164770      84    4120  6168974  5e218e imx6-unpatched/arch/arm/boot/compressed/vmlinux
8438980 3255060 9640528 21334568 1458a28 imx6-dc-dis/vmlinux
6164770      84    4120  6168974  5e218e imx6-dc-dis/arch/arm/boot/compressed/vmlinux
8448960 3253388 9639992 21342340 145a884 imx6-dc-ena/vmlinux
6175233      84    4120  6179437  5e4a6d imx6-dc-ena/arch/arm/boot/compressed/vmlinux
8446700 3245700 9639212 21331612 1457e9c imx6-dc-sort/vmlinux
6172284      84    4120  6176488  5e3ee8 imx6-dc-sort/arch/arm/boot/compressed/vmlinux

The interesting thing here is that while the data sizes are reduced
by enabling this option, but the text size actually _increases_ by
more than we save.  So the result is an overall bigger kernel.

Presumably this is because of the literal pools needing to be larger
in the .text segments, and more instructions necessary to access the
data as the compiler can no longer know the relative displacement
between neighbouring data items in the object file.

So, it seems dead-code elimination is not that useful on 32-bit ARM.
Ard Biesheuvel Nov. 6, 2018, 4:14 p.m. UTC | #9
On 6 November 2018 at 17:06, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Tue, Nov 06, 2018 at 04:18:46PM +0100, Ard Biesheuvel wrote:
>> On 6 November 2018 at 15:45, Russell King - ARM Linux
>> <linux@armlinux.org.uk> wrote:
>> > On Tue, Nov 06, 2018 at 03:39:58PM +0100, Ard Biesheuvel wrote:
>> >> On 6 November 2018 at 15:29, Russell King - ARM Linux
>> >> <linux@armlinux.org.uk> wrote:
>> >> > On Tue, Nov 06, 2018 at 03:13:07PM +0100, Ard Biesheuvel wrote:
>> >> >> On 6 November 2018 at 15:10, Russell King - ARM Linux
>> >> >> <linux@armlinux.org.uk> wrote:
>> >> >> > On Tue, Nov 06, 2018 at 03:08:06PM +0100, Ard Biesheuvel wrote:
>> >> >> >> On 6 November 2018 at 14:40, Russell King <rmk+kernel@armlinux.org.uk> wrote:
>> >> >> >> > When building with -fdata-sections, the BSS data is placed into separate
>> >> >> >> > sections, so we need to ask the linker to group these together into the
>> >> >> >> > .bss output section, so that they're correctly placed.
>> >> >> >> >
>> >> >> >> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
>> >> >> >> > ---
>> >> >> >> >  arch/arm/boot/compressed/vmlinux.lds.S | 2 +-
>> >> >> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >> >> >
>> >> >> >> > diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> >> > index 2b963d8e76dd..3b91bc3c606f 100644
>> >> >> >> > --- a/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> >> > +++ b/arch/arm/boot/compressed/vmlinux.lds.S
>> >> >> >> > @@ -118,7 +118,7 @@ SECTIONS
>> >> >> >> >
>> >> >> >> >    . = BSS_START;
>> >> >> >> >    __bss_start = .;
>> >> >> >> > -  .bss                 : { *(.bss) }
>> >> >> >> > +  .bss                 : { *(.bss) *(.bss.*) }
>> >> >> >>
>> >> >> >> Would it make sense to sort these by alignment? Otherwise, I suspect
>> >> >> >> you may get a lot of padding holes due to the number of different
>> >> >> >> input sections, each with its own alignment.
>> >> >> >
>> >> >> > We don't bother elsewhere in the kernel linker script - do you have
>> >> >> > a case where we get lots of padding?
>> >> >> >
>> >> >>
>> >> >> With -fdata-sections? How else are we ensuring that each .bss item in
>> >> >> its own section is not placed such that its alignment results in a
>> >> >> padding hole?
>> >> >
>> >> > Quite simply, we don't care (at the moment).  The alignment does not
>> >> > come from the size of each individual section, but the alignment
>> >> > requirements of data within the section.
>> >> >
>> >> > See include/asm-generic/vmlinux.lds.h and the definition of *_MAIN
>> >> > symbols from around line 60.
>> >> >
>> >>
>> >> This seems like an oversight to me: consider something like*
>> >>
>> >> unsigned char bar = 1;
>> >> unsigned long long foo = 4;
>> >>
>> >> build it with
>> >>
>> >> arm-linux-gnueabihf-gcc -fdata-sections -o /tmp/foo.o -c /tmp/foo.c
>> >>
>> >> and we end up with the following object file
>> >>
>> >> Section Headers:
>> >>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>> >>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>> >>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>> >>   [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  1
>> >>   [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  1
>> >>   [ 4] .data.bar         PROGBITS        00000000 000034 000001 00  WA  0   0  1
>> >>   [ 5] .data.foo         PROGBITS        00000000 000038 000008 00  WA  0   0  8
>> >>   ...
>> >>
>> >> Symbol table '.symtab' contains 13 entries:
>> >>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>> >>     11: 00000000     8 OBJECT  GLOBAL DEFAULT    5 foo
>> >>     12: 00000000     1 OBJECT  GLOBAL DEFAULT    4 bar
>> >>
>> >> whereas if I remove the -fdata-sections argument, I get
>> >>
>> >> Section Headers:
>> >>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>> >>   [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>> >>   [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  1
>> >>   [ 2] .data             PROGBITS        00000000 000038 000009 00  WA  0   0  8
>> >>   [ 3] .bss              NOBITS          00000000 000041 000000 00  WA  0   0  1
>> >>   ...
>> >>
>> >> and
>> >>
>> >> Symbol table '.symtab' contains 11 entries:
>> >>    Num:    Value  Size Type    Bind   Vis      Ndx Name
>> >>    ...
>> >>      9: 00000000     8 OBJECT  GLOBAL DEFAULT    2 foo
>> >>     10: 00000008     1 OBJECT  GLOBAL DEFAULT    2 bar
>> >>
>> >> In other words, we end up with a 7 byte padding hole by switching to
>> >> -fdata-sections, unless we sort the input objects by alignment.
>> >>
>> >> * for my test compiile, -fdata-sections would not produce different
>> >> input section so I am using .data for this example.
>> >
>> > Nevertheless, changing this is not a subject for a patch series that
>> > targets ARM.  What you've identified is a deficiency in the generic
>> > cross-arch support for this which needs a wider audience to be
>> > involved.
>> >
>>
>> Fair enough.
>>
>> For the record, when I build the arm64 defconfig kernel with
>> LD_DEAD_CODE_ELIMINATION turned on, adding --sort-section=alignment
>> reduces the image size from
>>
>>    text    data     bss     dec     hex filename
>> 11951684 6936332 385040 19273056 1261560 vmlinux
>>
>> to
>>
>>    text    data     bss     dec     hex filename
>> 11938452 6930404 383625 19252481 125c501 vmlinux
>
> I'm not sure that's relevant when this thread is about 32bit ARM.
> It's a little like building an x86 kernel and posting the results.
> Two entirely different architectures with different results.
>

You said it was a generic issue not an ARM issue, to which I replied
'fair enough', after which I shared some information that is relevant
to the generic issue under discussion.

> Here's the numbers for an imx6 ARM kernel:
>
>    text    data     bss      dec     hex filename
> 8438980 3255060 9640528 21334568 1458a28 imx6-unpatched/vmlinux
> 6164770      84    4120  6168974  5e218e imx6-unpatched/arch/arm/boot/compressed/vmlinux
> 8438980 3255060 9640528 21334568 1458a28 imx6-dc-dis/vmlinux
> 6164770      84    4120  6168974  5e218e imx6-dc-dis/arch/arm/boot/compressed/vmlinux
> 8448960 3253388 9639992 21342340 145a884 imx6-dc-ena/vmlinux
> 6175233      84    4120  6179437  5e4a6d imx6-dc-ena/arch/arm/boot/compressed/vmlinux
> 8446700 3245700 9639212 21331612 1457e9c imx6-dc-sort/vmlinux
> 6172284      84    4120  6176488  5e3ee8 imx6-dc-sort/arch/arm/boot/compressed/vmlinux
>
> The interesting thing here is that while the data sizes are reduced
> by enabling this option, but the text size actually _increases_ by
> more than we save.  So the result is an overall bigger kernel.
>
> Presumably this is because of the literal pools needing to be larger
> in the .text segments, and more instructions necessary to access the
> data as the compiler can no longer know the relative displacement
> between neighbouring data items in the object file.
>

I guess the compiler can no longer share literal pools between
functions, so they get duplicated into each one.

> So, it seems dead-code elimination is not that useful on 32-bit ARM.
>

Agreed.

Patch
diff mbox series

diff --git a/arch/arm/boot/compressed/vmlinux.lds.S b/arch/arm/boot/compressed/vmlinux.lds.S
index 2b963d8e76dd..3b91bc3c606f 100644
--- a/arch/arm/boot/compressed/vmlinux.lds.S
+++ b/arch/arm/boot/compressed/vmlinux.lds.S
@@ -118,7 +118,7 @@  SECTIONS
 
   . = BSS_START;
   __bss_start = .;
-  .bss			: { *(.bss) }
+  .bss			: { *(.bss) *(.bss.*) }
   _end = .;
 
   . = ALIGN(8);		/* the stack must be 64-bit aligned */