ARM: Do not allow unaligned accesses when CONFIG_ALIGNMENT_TRAP
diff mbox

Message ID 1306229953.19557.14.camel@e102109-lin.cambridge.arm.com
State New, archived
Headers show

Commit Message

Catalin Marinas May 24, 2011, 9:39 a.m. UTC
On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
> > Newer versions of gcc generate unaligned accesses by default, causing
> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
> > -mno-unaligned-access option to gcc.
> 
> This description doesn't make sense.  If we have alignment traps enabled,
> then we _expect_ to fix up unaligned loads and stores.
> 
> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
> 
> So what's the _actual_ problem that this is trying to address?  What's
> the panic/oops look like?  And that information should be in the commit
> description _anyway_.

Does the patch below look better?

We cannot move alignment_init() earlier as we don't know how early the
compiler would generate unaligned accesses. An alternative is some
#ifdef's in head.S. Please let me know which variant you prefer.


ARM: Do not allow unaligned accesses when CONFIG_ALIGNMENT_TRAP

From: Catalin Marinas <catalin.marinas@arm.com>

Newer versions of gcc generate unaligned accesses by default. For ARMv6
and newer architecture versions, Linux disables the alignment trapping
(SCTLR.A) in the alignment_init() function. However, unaligned accesses
may still happen before this function is called (currently
pcpu_dump_alloc_info causes a kernel panic).

This patch adds the -mno-unaligned-access option to gcc when
CONFIG_ALIGNMENT_TRAP is enabled.

Reported-by: Ali Saidi <ali.saidi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm/Makefile |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Mans Rullgard May 24, 2011, 2:17 p.m. UTC | #1
Catalin Marinas <catalin.marinas@arm.com> writes:

> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
>> > Newer versions of gcc generate unaligned accesses by default, causing
>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
>> > -mno-unaligned-access option to gcc.
>> 
>> This description doesn't make sense.  If we have alignment traps enabled,
>> then we _expect_ to fix up unaligned loads and stores.
>> 
>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
>> 
>> So what's the _actual_ problem that this is trying to address?  What's
>> the panic/oops look like?  And that information should be in the commit
>> description _anyway_.
>
> Does the patch below look better?
>
> We cannot move alignment_init() earlier as we don't know how early the
> compiler would generate unaligned accesses. An alternative is some
> #ifdef's in head.S. Please let me know which variant you prefer.

ifdefs may be ugly, but I don't see a better solution here.  Crippling
the entire build to make a couple of lines slightly more aesthetically
pleasing doesn't seem right to me.
Catalin Marinas May 24, 2011, 3:26 p.m. UTC | #2
2011/5/24 Måns Rullgård <mans@mansr.com>:
> Catalin Marinas <catalin.marinas@arm.com> writes:
>
>> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
>>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
>>> > Newer versions of gcc generate unaligned accesses by default, causing
>>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
>>> > -mno-unaligned-access option to gcc.
>>>
>>> This description doesn't make sense.  If we have alignment traps enabled,
>>> then we _expect_ to fix up unaligned loads and stores.
>>>
>>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
>>>
>>> So what's the _actual_ problem that this is trying to address?  What's
>>> the panic/oops look like?  And that information should be in the commit
>>> description _anyway_.
>>
>> Does the patch below look better?
>>
>> We cannot move alignment_init() earlier as we don't know how early the
>> compiler would generate unaligned accesses. An alternative is some
>> #ifdef's in head.S. Please let me know which variant you prefer.
>
> ifdefs may be ugly, but I don't see a better solution here.  Crippling
> the entire build to make a couple of lines slightly more aesthetically
> pleasing doesn't seem right to me.

BTW, are we sure that the code generated with unaligned accesses is
better? AFAIK, while processors support unaligned accesses, they may
not always be optimal.
Mans Rullgard May 24, 2011, 4:23 p.m. UTC | #3
Catalin Marinas <catalin.marinas@arm.com> writes:

> 2011/5/24 Måns Rullgård <mans@mansr.com>:
>> Catalin Marinas <catalin.marinas@arm.com> writes:
>>
>>> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
>>>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
>>>> > Newer versions of gcc generate unaligned accesses by default, causing
>>>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
>>>> > -mno-unaligned-access option to gcc.
>>>>
>>>> This description doesn't make sense.  If we have alignment traps enabled,
>>>> then we _expect_ to fix up unaligned loads and stores.
>>>>
>>>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
>>>>
>>>> So what's the _actual_ problem that this is trying to address?  What's
>>>> the panic/oops look like?  And that information should be in the commit
>>>> description _anyway_.
>>>
>>> Does the patch below look better?
>>>
>>> We cannot move alignment_init() earlier as we don't know how early the
>>> compiler would generate unaligned accesses. An alternative is some
>>> #ifdef's in head.S. Please let me know which variant you prefer.
>>
>> ifdefs may be ugly, but I don't see a better solution here.  Crippling
>> the entire build to make a couple of lines slightly more aesthetically
>> pleasing doesn't seem right to me.
>
> BTW, are we sure that the code generated with unaligned accesses is
> better? AFAIK, while processors support unaligned accesses, they may
> not always be optimal.

On Cortex-A8 unaligned loads not crossing a 128-bit boundary have no
penalty.  Crossing a 128-bit boundary stalls for 9 cycles (measured, 8
according to TRM).  On Cortex-A9, unaligned loads crossing a 64-bit
boundary delay 7 cycles (measured), other misalignments have no penalty.
On average these timings are faster than doing 4 byte loads and oring
them together, which uses 6 cycles on A9, 9 cycles on A8.

It may of course be the case that unaligned accesses are rare enough
that this isn't worth worrying about at all.
tip-bot for Dave Martin May 24, 2011, 5:13 p.m. UTC | #4
On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> 2011/5/24 Måns Rullgård <mans@mansr.com>:
> > Catalin Marinas <catalin.marinas@arm.com> writes:
> >
> >> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
> >>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
> >>> > Newer versions of gcc generate unaligned accesses by default, causing
> >>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
> >>> > -mno-unaligned-access option to gcc.
> >>>
> >>> This description doesn't make sense.  If we have alignment traps enabled,
> >>> then we _expect_ to fix up unaligned loads and stores.
> >>>
> >>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
> >>>
> >>> So what's the _actual_ problem that this is trying to address?  What's
> >>> the panic/oops look like?  And that information should be in the commit
> >>> description _anyway_.
> >>
> >> Does the patch below look better?
> >>
> >> We cannot move alignment_init() earlier as we don't know how early the
> >> compiler would generate unaligned accesses. An alternative is some
> >> #ifdef's in head.S. Please let me know which variant you prefer.
> >
> > ifdefs may be ugly, but I don't see a better solution here.  Crippling
> > the entire build to make a couple of lines slightly more aesthetically
> > pleasing doesn't seem right to me.
> 
> BTW, are we sure that the code generated with unaligned accesses is
> better? AFAIK, while processors support unaligned accesses, they may
> not always be optimal.

The code gcc generates to synthesise an unaligned access using aligned
accesses is pretty simplistic:

$ cat <<EOF | gcc -O2 -c unaligned.c && objdump -d unaligned.o
unsigned long readw(void *p)
{
	struct { unsigned long l; } __attribute__ (( __packed__ )) *s = p;

	return s->l;
}
EOF

00000000 <readw>:
   0:   7841            ldrb    r1, [r0, #1]
   2:   7803            ldrb    r3, [r0, #0]
   4:   7882            ldrb    r2, [r0, #2]
   6:   78c0            ldrb    r0, [r0, #3]
   8:   ea43 2301       orr.w   r3, r3, r1, lsl #8
   c:   ea43 4302       orr.w   r3, r3, r2, lsl #16
  10:   ea43 6000       orr.w   r0, r3, r0, lsl #24
  14:   4770            bx      lr
  16:   bf00            nop


For code which natively needs to read unaligned fields from data structures,
I sincerely doubt that the CPU will not beat the above code for efficiency...

So if there's code doing unaligned access to data structures for a good
reason, building with unaligned access support turned on in the compiler
seems a good idea, if that code might performance-critical for anything.


Most code should not be doing unaligned accesses even at the source level
though unless there's a good reason, since on average unaligned accesses
will not be quite as efficient as aligned accesses even if performed
natively by the CPU rather than being synthesised by the compiler.

Where are the observed faults coming from?  Maybe it's the faulting code
that's the problem here, not the compiler...

Cheers
---Dave
Nicolas Pitre May 24, 2011, 5:26 p.m. UTC | #5
On Tue, 24 May 2011, Måns Rullgård wrote:

> Catalin Marinas <catalin.marinas@arm.com> writes:
> 
> > 2011/5/24 Måns Rullgård <mans@mansr.com>:
> >> Catalin Marinas <catalin.marinas@arm.com> writes:
> >>
> >>> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote:
> >>>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote:
> >>>> > Newer versions of gcc generate unaligned accesses by default, causing
> >>>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the
> >>>> > -mno-unaligned-access option to gcc.
> >>>>
> >>>> This description doesn't make sense.  If we have alignment traps enabled,
> >>>> then we _expect_ to fix up unaligned loads and stores.
> >>>>
> >>>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled.
> >>>>
> >>>> So what's the _actual_ problem that this is trying to address?  What's
> >>>> the panic/oops look like?  And that information should be in the commit
> >>>> description _anyway_.
> >>>
> >>> Does the patch below look better?
> >>>
> >>> We cannot move alignment_init() earlier as we don't know how early the
> >>> compiler would generate unaligned accesses. An alternative is some
> >>> #ifdef's in head.S. Please let me know which variant you prefer.
> >>
> >> ifdefs may be ugly, but I don't see a better solution here.  Crippling
> >> the entire build to make a couple of lines slightly more aesthetically
> >> pleasing doesn't seem right to me.
> >
> > BTW, are we sure that the code generated with unaligned accesses is
> > better? AFAIK, while processors support unaligned accesses, they may
> > not always be optimal.
> 
> On Cortex-A8 unaligned loads not crossing a 128-bit boundary have no
> penalty.  Crossing a 128-bit boundary stalls for 9 cycles (measured, 8
> according to TRM).  On Cortex-A9, unaligned loads crossing a 64-bit
> boundary delay 7 cycles (measured), other misalignments have no penalty.
> On average these timings are faster than doing 4 byte loads and oring
> them together, which uses 6 cycles on A9, 9 cycles on A8.
> 
> It may of course be the case that unaligned accesses are rare enough
> that this isn't worth worrying about at all.

I think this is largely the case in the kernel.

If anything, unaligned accesses were never expected from the compiler 
before.  Preserving that behavior would be preferable until we actually 
identify a use case where doing otherwise is beneficial.

User space is a totally different thing of course.


Nicolas
Catalin Marinas May 25, 2011, 11:14 a.m. UTC | #6
On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
> On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> > BTW, are we sure that the code generated with unaligned accesses is
> > better? AFAIK, while processors support unaligned accesses, they may
> > not always be optimal.
> 
> The code gcc generates to synthesise an unaligned access using aligned
> accesses is pretty simplistic:
...
> For code which natively needs to read unaligned fields from data structures,
> I sincerely doubt that the CPU will not beat the above code for efficiency...
> 
> So if there's code doing unaligned access to data structures for a good
> reason, building with unaligned access support turned on in the compiler
> seems a good idea, if that code might performance-critical for anything.

gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
function. We have a local variable like below (9 bytes):

	char empty_str[] = "--------";

and it looks like other stack accesses are unaligned:

c0082ba0 <pcpu_dump_alloc_info>:
c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
c0082bb0:   e58d1014    str r1, [sp, #20]
c0082bb4:   e58d0020    str r0, [sp, #32]
c0082bb8:   e8b40003    ldm r4!, {r0, r1}
c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
c0082bc0:   e59d0014    ldr r0, [sp, #20]
c0082bc4:   e5d43000    ldrb    r3, [r4]

I haven't tried with -mno-unaligned-access, I suspect the variables on
the stack would be aligned.
tip-bot for Dave Martin May 25, 2011, 12:43 p.m. UTC | #7
On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> > > BTW, are we sure that the code generated with unaligned accesses is
> > > better? AFAIK, while processors support unaligned accesses, they may
> > > not always be optimal.
> > 
> > The code gcc generates to synthesise an unaligned access using aligned
> > accesses is pretty simplistic:
> ...
> > For code which natively needs to read unaligned fields from data structures,
> > I sincerely doubt that the CPU will not beat the above code for efficiency...
> > 
> > So if there's code doing unaligned access to data structures for a good
> > reason, building with unaligned access support turned on in the compiler
> > seems a good idea, if that code might performance-critical for anything.
> 
> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
> function. We have a local variable like below (9 bytes):
> 
> 	char empty_str[] = "--------";
> 
> and it looks like other stack accesses are unaligned:
> 
> c0082ba0 <pcpu_dump_alloc_info>:
> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
> c0082bb0:   e58d1014    str r1, [sp, #20]
> c0082bb4:   e58d0020    str r0, [sp, #32]
> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
> c0082bc0:   e59d0014    ldr r0, [sp, #20]
> c0082bc4:   e5d43000    ldrb    r3, [r4]
> 
> I haven't tried with -mno-unaligned-access, I suspect the variables on
> the stack would be aligned.

So, it looks like empty_str may be misaligned on the stack, and the compiler
is doing a misaligned store when initialising it.

Since the unaligned access support stuff is new, I'm suspicious of a
compiler bug here...  Can you follow up with your friendly neighbourhood
tools guys?

Cheers
---Dave
Mans Rullgard May 25, 2011, 1:32 p.m. UTC | #8
Dave Martin <dave.martin@linaro.org> writes:

> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
>> > > BTW, are we sure that the code generated with unaligned accesses is
>> > > better? AFAIK, while processors support unaligned accesses, they may
>> > > not always be optimal.
>> > 
>> > The code gcc generates to synthesise an unaligned access using aligned
>> > accesses is pretty simplistic:
>> ...
>> > For code which natively needs to read unaligned fields from data structures,
>> > I sincerely doubt that the CPU will not beat the above code for efficiency...
>> > 
>> > So if there's code doing unaligned access to data structures for a good
>> > reason, building with unaligned access support turned on in the compiler
>> > seems a good idea, if that code might performance-critical for anything.
>> 
>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
>> function. We have a local variable like below (9 bytes):
>> 
>> 	char empty_str[] = "--------";
>> 
>> and it looks like other stack accesses are unaligned:
>> 
>> c0082ba0 <pcpu_dump_alloc_info>:
>> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
>> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
>> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
>> c0082bb0:   e58d1014    str r1, [sp, #20]
>> c0082bb4:   e58d0020    str r0, [sp, #32]
>> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
>> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
>> c0082bc0:   e59d0014    ldr r0, [sp, #20]
>> c0082bc4:   e5d43000    ldrb    r3, [r4]
>> 
>> I haven't tried with -mno-unaligned-access, I suspect the variables on
>> the stack would be aligned.
>
> So, it looks like empty_str may be misaligned on the stack, and the compiler
> is doing a misaligned store when initialising it.

empty_str has type char[] so there are no alignment requirements.

> Since the unaligned access support stuff is new, I'm suspicious of a
> compiler bug here...  Can you follow up with your friendly neighbourhood
> tools guys?

The compiler might be doing something unintentional, but I see no
violation of any rules here.
tip-bot for Dave Martin May 25, 2011, 2:05 p.m. UTC | #9
On Wed, May 25, 2011 at 02:32:13PM +0100, Måns Rullgård wrote:
> Dave Martin <dave.martin@linaro.org> writes:
> 
> > On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
> >> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
> >> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> >> > > BTW, are we sure that the code generated with unaligned accesses is
> >> > > better? AFAIK, while processors support unaligned accesses, they may
> >> > > not always be optimal.
> >> > 
> >> > The code gcc generates to synthesise an unaligned access using aligned
> >> > accesses is pretty simplistic:
> >> ...
> >> > For code which natively needs to read unaligned fields from data structures,
> >> > I sincerely doubt that the CPU will not beat the above code for efficiency...
> >> > 
> >> > So if there's code doing unaligned access to data structures for a good
> >> > reason, building with unaligned access support turned on in the compiler
> >> > seems a good idea, if that code might performance-critical for anything.
> >> 
> >> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
> >> function. We have a local variable like below (9 bytes):
> >> 
> >> 	char empty_str[] = "--------";
> >> 
> >> and it looks like other stack accesses are unaligned:
> >> 
> >> c0082ba0 <pcpu_dump_alloc_info>:
> >> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
> >> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
> >> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
> >> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
> >> c0082bb0:   e58d1014    str r1, [sp, #20]
> >> c0082bb4:   e58d0020    str r0, [sp, #32]
> >> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
> >> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
> >> c0082bc0:   e59d0014    ldr r0, [sp, #20]
> >> c0082bc4:   e5d43000    ldrb    r3, [r4]
> >> 
> >> I haven't tried with -mno-unaligned-access, I suspect the variables on
> >> the stack would be aligned.
> >
> > So, it looks like empty_str may be misaligned on the stack, and the compiler
> > is doing a misaligned store when initialising it.
> 
> empty_str has type char[] so there are no alignment requirements.
> 
> > Since the unaligned access support stuff is new, I'm suspicious of a
> > compiler bug here...  Can you follow up with your friendly neighbourhood
> > tools guys?
> 
> The compiler might be doing something unintentional, but I see no
> violation of any rules here.

Indeed -- mainly I'm wondering if this specific problem (unaligned
accesses generated by the compiler for normal C code) is one we will
ever need to deal with for kernel code.  I'd guess that the performance
of unaligned accesses will always be at least slightly suboptimal in
practice, so the compiler guys may well not intend to misalign data
just for the sake of it.

The answer might be yes or no; it depends on the compiler guys.

(By "normal C code" I exclude things like the __packed__ attribute,
inline asm, and and type-punning activities forbidden by the C language
standards.  By nature, these will affect specific parts of the kernel
rather than being universal.)

---Dave
Mans Rullgard May 25, 2011, 2:48 p.m. UTC | #10
Dave Martin <dave.martin@linaro.org> writes:

> On Wed, May 25, 2011 at 02:32:13PM +0100, Måns Rullgård wrote:
>> Dave Martin <dave.martin@linaro.org> writes:
>> 
>> > On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
>> >> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
>> >> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
>> >> > > BTW, are we sure that the code generated with unaligned accesses is
>> >> > > better? AFAIK, while processors support unaligned accesses, they may
>> >> > > not always be optimal.
>> >> > 
>> >> > The code gcc generates to synthesise an unaligned access using aligned
>> >> > accesses is pretty simplistic:
>> >> ...
>> >> > For code which natively needs to read unaligned fields from data structures,
>> >> > I sincerely doubt that the CPU will not beat the above code for efficiency...
>> >> > 
>> >> > So if there's code doing unaligned access to data structures for a good
>> >> > reason, building with unaligned access support turned on in the compiler
>> >> > seems a good idea, if that code might performance-critical for anything.
>> >> 
>> >> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
>> >> function. We have a local variable like below (9 bytes):
>> >> 
>> >> 	char empty_str[] = "--------";
>> >> 
>> >> and it looks like other stack accesses are unaligned:
>> >> 
>> >> c0082ba0 <pcpu_dump_alloc_info>:
>> >> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>> >> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
>> >> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
>> >> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
>> >> c0082bb0:   e58d1014    str r1, [sp, #20]
>> >> c0082bb4:   e58d0020    str r0, [sp, #32]
>> >> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
>> >> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
>> >> c0082bc0:   e59d0014    ldr r0, [sp, #20]
>> >> c0082bc4:   e5d43000    ldrb    r3, [r4]
>> >> 
>> >> I haven't tried with -mno-unaligned-access, I suspect the variables on
>> >> the stack would be aligned.
>> >
>> > So, it looks like empty_str may be misaligned on the stack, and the compiler
>> > is doing a misaligned store when initialising it.
>> 
>> empty_str has type char[] so there are no alignment requirements.
>> 
>> > Since the unaligned access support stuff is new, I'm suspicious of a
>> > compiler bug here...  Can you follow up with your friendly neighbourhood
>> > tools guys?
>> 
>> The compiler might be doing something unintentional, but I see no
>> violation of any rules here.
>
> Indeed -- mainly I'm wondering if this specific problem (unaligned
> accesses generated by the compiler for normal C code) is one we will
> ever need to deal with for kernel code.  I'd guess that the performance
> of unaligned accesses will always be at least slightly suboptimal in
> practice, so the compiler guys may well not intend to misalign data
> just for the sake of it.

The only case where not aligning data on stack could make any sense at
all is if there are multiple odd-sized char (or short) arrays, in which
case packing them together would save a few bytes, and even this is
hardly worth the potential cost.

If gcc is indeed misaligning arrays on the stack, someone should have a
word with the compiler people and get them to fix it.
Catalin Marinas May 25, 2011, 2:50 p.m. UTC | #11
2011/5/25 Måns Rullgård <mans@mansr.com>:
> Dave Martin <dave.martin@linaro.org> writes:
>
>> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
>>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
>>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
>>> > > BTW, are we sure that the code generated with unaligned accesses is
>>> > > better? AFAIK, while processors support unaligned accesses, they may
>>> > > not always be optimal.
>>> >
>>> > The code gcc generates to synthesise an unaligned access using aligned
>>> > accesses is pretty simplistic:
>>> ...
>>> > For code which natively needs to read unaligned fields from data structures,
>>> > I sincerely doubt that the CPU will not beat the above code for efficiency...
>>> >
>>> > So if there's code doing unaligned access to data structures for a good
>>> > reason, building with unaligned access support turned on in the compiler
>>> > seems a good idea, if that code might performance-critical for anything.
>>>
>>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
>>> function. We have a local variable like below (9 bytes):
>>>
>>>      char empty_str[] = "--------";
>>>
>>> and it looks like other stack accesses are unaligned:
>>>
>>> c0082ba0 <pcpu_dump_alloc_info>:
>>> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>>> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
>>> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
>>> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
>>> c0082bb0:   e58d1014    str r1, [sp, #20]
>>> c0082bb4:   e58d0020    str r0, [sp, #32]
>>> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
>>> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
>>> c0082bc0:   e59d0014    ldr r0, [sp, #20]
>>> c0082bc4:   e5d43000    ldrb    r3, [r4]
>>>
>>> I haven't tried with -mno-unaligned-access, I suspect the variables on
>>> the stack would be aligned.
>>
>> So, it looks like empty_str may be misaligned on the stack, and the compiler
>> is doing a misaligned store when initialising it.
>
> empty_str has type char[] so there are no alignment requirements.

I think the local variables after char empty_str[] are unaligned (int
alloc). Changing the array size to 16 solves the issue.

The gcc guys here in ARM will have a look and I'll get back to you.
Mans Rullgard May 25, 2011, 2:53 p.m. UTC | #12
Catalin Marinas <catalin.marinas@arm.com> writes:

> 2011/5/25 Måns Rullgård <mans@mansr.com>:
>> Dave Martin <dave.martin@linaro.org> writes:
>>
>>> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
>>>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
>>>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
>>>> > > BTW, are we sure that the code generated with unaligned accesses is
>>>> > > better? AFAIK, while processors support unaligned accesses, they may
>>>> > > not always be optimal.
>>>> >
>>>> > The code gcc generates to synthesise an unaligned access using aligned
>>>> > accesses is pretty simplistic:
>>>> ...
>>>> > For code which natively needs to read unaligned fields from data structures,
>>>> > I sincerely doubt that the CPU will not beat the above code for efficiency...
>>>> >
>>>> > So if there's code doing unaligned access to data structures for a good
>>>> > reason, building with unaligned access support turned on in the compiler
>>>> > seems a good idea, if that code might performance-critical for anything.
>>>>
>>>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
>>>> function. We have a local variable like below (9 bytes):
>>>>
>>>>      char empty_str[] = "--------";
>>>>
>>>> and it looks like other stack accesses are unaligned:
>>>>
>>>> c0082ba0 <pcpu_dump_alloc_info>:
>>>> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
>>>> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
>>>> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
>>>> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
>>>> c0082bb0:   e58d1014    str r1, [sp, #20]
>>>> c0082bb4:   e58d0020    str r0, [sp, #32]
>>>> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
>>>> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
>>>> c0082bc0:   e59d0014    ldr r0, [sp, #20]
>>>> c0082bc4:   e5d43000    ldrb    r3, [r4]
>>>>
>>>> I haven't tried with -mno-unaligned-access, I suspect the variables on
>>>> the stack would be aligned.
>>>
>>> So, it looks like empty_str may be misaligned on the stack, and the compiler
>>> is doing a misaligned store when initialising it.
>>
>> empty_str has type char[] so there are no alignment requirements.
>
> I think the local variables after char empty_str[] are unaligned (int
> alloc).

That definitely sounds like a bug.  Does passing the address of such a
variable somewhere make a difference?  Some ABI rules only apply to
actual pointers.
Will Deacon May 26, 2011, 5:10 p.m. UTC | #13
Right... [adding bunch of people to CC],


On Wed, 2011-05-25 at 15:50 +0100, Catalin Marinas wrote:
> 2011/5/25 Måns Rullgård <mans@mansr.com>:
> > Dave Martin <dave.martin@linaro.org> writes:
> >
> >> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote:
> >>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote:
> >>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote:
> >>> > > BTW, are we sure that the code generated with unaligned accesses is
> >>> > > better? AFAIK, while processors support unaligned accesses, they may
> >>> > > not always be optimal.
> >>> >
> >>> > The code gcc generates to synthesise an unaligned access using aligned
> >>> > accesses is pretty simplistic:
> >>> ...
> >>> > For code which natively needs to read unaligned fields from data structures,
> >>> > I sincerely doubt that the CPU will not beat the above code for efficiency...
> >>> >
> >>> > So if there's code doing unaligned access to data structures for a good
> >>> > reason, building with unaligned access support turned on in the compiler
> >>> > seems a good idea, if that code might performance-critical for anything.
> >>>
> >>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info()
> >>> function. We have a local variable like below (9 bytes):
> >>>
> >>>      char empty_str[] = "--------";
> >>>
> >>> and it looks like other stack accesses are unaligned:
> >>>
> >>> c0082ba0 <pcpu_dump_alloc_info>:
> >>> c0082ba0:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
> >>> c0082ba4:   e3074118    movw    r4, #28952  ; 0x7118
> >>> c0082ba8:   e24dd04c    sub sp, sp, #76 ; 0x4c
> >>> c0082bac:   e34c402a    movt    r4, #49194  ; 0xc02a
> >>> c0082bb0:   e58d1014    str r1, [sp, #20]
> >>> c0082bb4:   e58d0020    str r0, [sp, #32]
> >>> c0082bb8:   e8b40003    ldm r4!, {r0, r1}
> >>> c0082bbc:   e58d003f    str r0, [sp, #63]   <----------- !!!!!
> >>> c0082bc0:   e59d0014    ldr r0, [sp, #20]
> >>> c0082bc4:   e5d43000    ldrb    r3, [r4]
> >>>
> >>> I haven't tried with -mno-unaligned-access, I suspect the variables on
> >>> the stack would be aligned.
> >>
> >> So, it looks like empty_str may be misaligned on the stack, and the compiler
> >> is doing a misaligned store when initialising it.
> >
> > empty_str has type char[] so there are no alignment requirements.
> 
> I think the local variables after char empty_str[] are unaligned (int
> alloc). Changing the array size to 16 solves the issue.
> 
> The gcc guys here in ARM will have a look and I'll get back to you.

This issue seems to be caused by passing -fconserve-stack to GCC. This
was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
4.5") and as you can see from the archive:

http://lkml.org/lkml/2009/9/20/39

it was thought to only have an impact on inlining decisions. Looking at
the documentation for GCC 4.6:

-fconserve-stack
          Attempt to minimize stack usage. The compiler will attempt to
use less stack space, even if that makes the program slower. This option
implies setting the ‘large-stack-frame’ parameter to 100 and the
‘large-stack-frame-growth’ parameter to 400.

So it sounds like we might not want to enable this blindly across all
architectures. Indeed, on ARM, it encourages the compiler to pack
variables on the stack which leads to the weird and wonderful alignment
situation that has been encountered in this thread.

Can we remove -fconserve-stack from the top-level Makefile (or at least
make it conditional by architecture)?

Will
Mans Rullgard May 26, 2011, 6:14 p.m. UTC | #14
Will Deacon <will.deacon@arm.com> writes:

> This issue seems to be caused by passing -fconserve-stack to GCC. This
> was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
> 4.5") and as you can see from the archive:
>
> http://lkml.org/lkml/2009/9/20/39
>
> it was thought to only have an impact on inlining decisions. Looking at
> the documentation for GCC 4.6:
>
> -fconserve-stack
>           Attempt to minimize stack usage. The compiler will attempt to
> use less stack space, even if that makes the program slower. This option
> implies setting the ‘large-stack-frame’ parameter to 100 and the
> ‘large-stack-frame-growth’ parameter to 400.
>
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.
>
> Can we remove -fconserve-stack from the top-level Makefile (or at least
> make it conditional by architecture)?

Sounds like a good idea to me.
Andi Kleen May 26, 2011, 7:58 p.m. UTC | #15
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.

Interesting. I'm pretty sure it didn't do that when I added the flag.

Anyways making it a CONFIG is fine for me. Just don't set it on ARM. It 
should be set on x86
at least.

-Andi
Nicolas Pitre May 26, 2011, 9:03 p.m. UTC | #16
On Thu, 26 May 2011, Will Deacon wrote:

> This issue seems to be caused by passing -fconserve-stack to GCC. This
> was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc
> 4.5") and as you can see from the archive:
> 
> http://lkml.org/lkml/2009/9/20/39
> 
> it was thought to only have an impact on inlining decisions. Looking at
> the documentation for GCC 4.6:
> 
> -fconserve-stack
>           Attempt to minimize stack usage. The compiler will attempt to
> use less stack space, even if that makes the program slower. This option
> implies setting the ‘large-stack-frame’ parameter to 100 and the
> ‘large-stack-frame-growth’ parameter to 400.
> 
> So it sounds like we might not want to enable this blindly across all
> architectures. Indeed, on ARM, it encourages the compiler to pack
> variables on the stack which leads to the weird and wonderful alignment
> situation that has been encountered in this thread.
> 
> Can we remove -fconserve-stack from the top-level Makefile (or at least
> make it conditional by architecture)?

I think this is an orthogonal issue.

My opinion is that we should use -mno-unaligned-access by default on 
ARM.  The reason is that we've been expecting the compiler not to cause 
unaligned accesses for ages, and letting the compiler, for whatever 
reasons including things like -fconserve-stack, produce unaligned 
accesses behind our back is a change in behavior we might not always be 
prepared for.  Unaligned accesses in the kernel should be rare anyway, 
and allowing the compiler to generate them can be allowed for selected 
files when proven beneficial.

It is possible that -fconserve-stack is still valuable on ARM given that 
it is also used with -mno-unaligned-access for other things than 
structure packing on the stack, and therefore its merits can be debated 
independently from the alignment issue at hand.


Nicolas
Andi Kleen May 26, 2011, 9:10 p.m. UTC | #17
> It is possible that -fconserve-stack is still valuable on ARM given that
> it is also used with -mno-unaligned-access for other things than
> structure packing on the stack, and therefore its merits can be debated
> independently from the alignment issue at hand.

The big advantage of -fconserve-stack is that it throttles the inliner 
if the inlining
would cause too much stack growth. This is something you likely want
on ARM too, especially as code gets more and more complex.

-Andi
Mans Rullgard May 26, 2011, 9:26 p.m. UTC | #18
Andi Kleen <ak@linux.intel.com> writes:

>> It is possible that -fconserve-stack is still valuable on ARM given that
>> it is also used with -mno-unaligned-access for other things than
>> structure packing on the stack, and therefore its merits can be debated
>> independently from the alignment issue at hand.
>
> The big advantage of -fconserve-stack is that it throttles the inliner
> if the inlining would cause too much stack growth. This is something
> you likely want on ARM too, especially as code gets more and more
> complex.

Is there no way to get that effect without also activating the
aggressive packing?
Russell King - ARM Linux May 26, 2011, 9:51 p.m. UTC | #19
On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote:
> It is possible that -fconserve-stack is still valuable on ARM given that 
> it is also used with -mno-unaligned-access for other things than 
> structure packing on the stack, and therefore its merits can be debated 
> independently from the alignment issue at hand.

Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
suspect the variables on the stack would be aligned.".  So I don't think
we know enough to say whether -mno-unaligned-access avoids the stack
packing.
Andi Kleen May 26, 2011, 10:29 p.m. UTC | #20
> Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
> suspect the variables on the stack would be aligned.".  So I don't think
> we know enough to say whether -mno-unaligned-access avoids the stack
> packing.

It won't, the arm gcc code just checks flag_conserve_stack.
IMHO it's just a gcc bug. You should report it to 
http://gcc.gnu.org/bugzilla

As a temporary workaround you can disable it in the kernel too, but as soon
as the compiler it's fixed I would recommend considering to reenable it.

-Andi
Catalin Marinas May 27, 2011, 8:38 a.m. UTC | #21
On Thu, May 26, 2011 at 10:51:01PM +0100, Russell King - ARM Linux wrote:
> On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote:
> > It is possible that -fconserve-stack is still valuable on ARM given that
> > it is also used with -mno-unaligned-access for other things than
> > structure packing on the stack, and therefore its merits can be debated
> > independently from the alignment issue at hand.
> 
> Catalin said in his mail "I haven't tried with -mno-unaligned-access, I
> suspect the variables on the stack would be aligned.".  So I don't think
> we know enough to say whether -mno-unaligned-access avoids the stack
> packing.

OK, I tried this now:

-fconserve-stack: we get unaligned accesses on the stack because the
newer versions of gcc turned unaligned accesses on by default.

-fconserve-stack -mno-unaligned-access: the stack variables are aligned.
We probably get the benefit of -fconserve-stack as well.

So as per the initial post in this thread, we could have
-mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
Nicolas suggested, we could compile some files with -munaligned-access
(and maybe -fno-conserve-stack).

I raised this with the gcc guys so they are looking into it. But it
really doesn't look like a gcc bug as long as -mno-unaligned-access is
taken into account.
Russell King - ARM Linux May 27, 2011, 8:54 a.m. UTC | #22
On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> OK, I tried this now:
> 
> -fconserve-stack: we get unaligned accesses on the stack because the
> newer versions of gcc turned unaligned accesses on by default.
> 
> -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> We probably get the benefit of -fconserve-stack as well.
> 
> So as per the initial post in this thread, we could have
> -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> Nicolas suggested, we could compile some files with -munaligned-access
> (and maybe -fno-conserve-stack).
> 
> I raised this with the gcc guys so they are looking into it. But it
> really doesn't look like a gcc bug as long as -mno-unaligned-access is
> taken into account.

Ok, we need to check one last thing, and that's what the behaviour is
with -mno-unaligned-access and packed structures (such as the ethernet
header).  If it makes no difference, then I suggest we always build
with -mno-unaligned-access.
Catalin Marinas May 27, 2011, 9:51 a.m. UTC | #23
On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
> On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> > OK, I tried this now:
> >
> > -fconserve-stack: we get unaligned accesses on the stack because the
> > newer versions of gcc turned unaligned accesses on by default.
> >
> > -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> > We probably get the benefit of -fconserve-stack as well.
> >
> > So as per the initial post in this thread, we could have
> > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> > Nicolas suggested, we could compile some files with -munaligned-access
> > (and maybe -fno-conserve-stack).
> >
> > I raised this with the gcc guys so they are looking into it. But it
> > really doesn't look like a gcc bug as long as -mno-unaligned-access is
> > taken into account.
> 
> Ok, we need to check one last thing, and that's what the behaviour is
> with -mno-unaligned-access and packed structures (such as the ethernet
> header).  If it makes no difference, then I suggest we always build
> with -mno-unaligned-access.

I tried some simple code below:

struct test {
	unsigned char a[6];
	unsigned long b;
} __attribute__((packed));

void set(struct test *t, unsigned long v)
{
	t->b = v;
}

int main(void)
{
	struct test t;
	
	set(&t, 10);

	return 0;
}

With -mno-unaligned-access in newer toolchains, the set() function looks
like this (compiled with -march=armv7):

00000000 <set>:
   0:   e7e7c451        ubfx    ip, r1, #8, #8
   4:   e7e72851        ubfx    r2, r1, #16, #8
   8:   e1a03c21        lsr     r3, r1, #24
   c:   e5c01006        strb    r1, [r0, #6]
  10:   e5c0c007        strb    ip, [r0, #7]
  14:   e5c02008        strb    r2, [r0, #8]
  18:   e5c03009        strb    r3, [r0, #9]
  1c:   e12fff1e        bx      lr

If I don't pass -mno-unaligned-access later toolchains use unaligned
accesses by default and the set() function is more efficient:

00000000 <set>:
   0:   e5801006        str     r1, [r0, #6]
   4:   e12fff1e        bx      lr

The problem is that in addition to that we also get unaligned stack
variables which are not really efficient. Either way we have a drawback
somewhere. We could argue that -fconserve-stack is badly implemented on
ARM.
Catalin Marinas May 27, 2011, 9:56 a.m. UTC | #24
On Fri, 2011-05-27 at 10:51 +0100, Catalin Marinas wrote:
> On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
> > On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote:
> > > OK, I tried this now:
> > >
> > > -fconserve-stack: we get unaligned accesses on the stack because the
> > > newer versions of gcc turned unaligned accesses on by default.
> > >
> > > -fconserve-stack -mno-unaligned-access: the stack variables are aligned.
> > > We probably get the benefit of -fconserve-stack as well.
> > >
> > > So as per the initial post in this thread, we could have
> > > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As
> > > Nicolas suggested, we could compile some files with -munaligned-access
> > > (and maybe -fno-conserve-stack).
> > >
> > > I raised this with the gcc guys so they are looking into it. But it
> > > really doesn't look like a gcc bug as long as -mno-unaligned-access is
> > > taken into account.
> > 
> > Ok, we need to check one last thing, and that's what the behaviour is
> > with -mno-unaligned-access and packed structures (such as the ethernet
> > header).  If it makes no difference, then I suggest we always build
> > with -mno-unaligned-access.
> 
> I tried some simple code below:
> 
> struct test {
> 	unsigned char a[6];
> 	unsigned long b;
> } __attribute__((packed));
> 
> void set(struct test *t, unsigned long v)
> {
> 	t->b = v;
> }
> 
> int main(void)
> {
> 	struct test t;
> 	
> 	set(&t, 10);
> 
> 	return 0;
> }
> 
> With -mno-unaligned-access in newer toolchains, the set() function looks
> like this (compiled with -march=armv7):
> 
> 00000000 <set>:
>    0:   e7e7c451        ubfx    ip, r1, #8, #8
>    4:   e7e72851        ubfx    r2, r1, #16, #8
>    8:   e1a03c21        lsr     r3, r1, #24
>    c:   e5c01006        strb    r1, [r0, #6]
>   10:   e5c0c007        strb    ip, [r0, #7]
>   14:   e5c02008        strb    r2, [r0, #8]
>   18:   e5c03009        strb    r3, [r0, #9]
>   1c:   e12fff1e        bx      lr
> 
> If I don't pass -mno-unaligned-access later toolchains use unaligned
> accesses by default and the set() function is more efficient:
> 
> 00000000 <set>:
>    0:   e5801006        str     r1, [r0, #6]
>    4:   e12fff1e        bx      lr

For completeness, I tried with "unsigned short b" in the structure above
hoping that the compiler would notice that it is 16-bit aligned.
Unfortunately, it doesn't. Code below with -mno-unaligned-access:

00000000 <set>:
   0:   e1a03421        lsr     r3, r1, #8
   4:   e5c01006        strb    r1, [r0, #6]
   8:   e5c03007        strb    r3, [r0, #7]
   c:   e12fff1e        bx      lr
Will Deacon May 27, 2011, 10:05 a.m. UTC | #25
Hi Andi,

On Thu, 2011-05-26 at 22:10 +0100, Andi Kleen wrote:
> 
> > It is possible that -fconserve-stack is still valuable on ARM given that
> > it is also used with -mno-unaligned-access for other things than
> > structure packing on the stack, and therefore its merits can be debated
> > independently from the alignment issue at hand.
> 
> The big advantage of -fconserve-stack is that it throttles the inliner
> if the inlining
> would cause too much stack growth. This is something you likely want
> on ARM too, especially as code gets more and more complex.

Do you have any concrete examples of -fconserve-stack giving an overall
win that isn't in the noise? The fact that the GCC documentation
explicitly states that enabling the option can lead to `making the
program slower' does make me question why we're enabling it in the first
place.

>From private conversation, the GCC guys don't seem to think this is a
bug so I'm reluctant to open a bugzilla ticket.

Will
Mans Rullgard May 27, 2011, 12:46 p.m. UTC | #26
Catalin Marinas <catalin.marinas@arm.com> writes:

> On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote:
>> 
>> Ok, we need to check one last thing, and that's what the behaviour is
>> with -mno-unaligned-access and packed structures (such as the ethernet
>> header).  If it makes no difference, then I suggest we always build
>> with -mno-unaligned-access.
>
> I tried some simple code below:
>
> struct test {
> 	unsigned char a[6];
> 	unsigned long b;
> } __attribute__((packed));
>
> void set(struct test *t, unsigned long v)
> {
> 	t->b = v;
> }
>
> int main(void)
> {
> 	struct test t;
> 	
> 	set(&t, 10);
>
> 	return 0;
> }
>
> With -mno-unaligned-access in newer toolchains, the set() function looks
> like this (compiled with -march=armv7):
>
> 00000000 <set>:
>    0:   e7e7c451        ubfx    ip, r1, #8, #8
>    4:   e7e72851        ubfx    r2, r1, #16, #8
>    8:   e1a03c21        lsr     r3, r1, #24
>    c:   e5c01006        strb    r1, [r0, #6]
>   10:   e5c0c007        strb    ip, [r0, #7]
>   14:   e5c02008        strb    r2, [r0, #8]
>   18:   e5c03009        strb    r3, [r0, #9]
>   1c:   e12fff1e        bx      lr
>
> If I don't pass -mno-unaligned-access later toolchains use unaligned
> accesses by default and the set() function is more efficient:
>
> 00000000 <set>:
>    0:   e5801006        str     r1, [r0, #6]
>    4:   e12fff1e        bx      lr

This is certainly something we should want.  Although some people
expressed concerns over introducing unaligned accesses where there were
previously none, I don't see how this could pose a problem as long as we
make sure strict alignment checking is off.  Some basic testing of code
paths known to use unaligned accesses should suffice IMO.

> The problem is that in addition to that we also get unaligned stack
> variables which are not really efficient. Either way we have a drawback
> somewhere. We could argue that -fconserve-stack is badly implemented on
> ARM.

Unless someone can demonstrate a clear win from -fconserve-stack, I
think it's pretty obvious that this flag does more harm than good on
ARM, especially in conjunction with unaligned accesses being allowed.

If the stack packing could be disabled while retaining the other
(presumably beneficial) effects of -fconserve-stack, it might be
reconsidered.
Andi Kleen May 27, 2011, 4:53 p.m. UTC | #27
> Do you have any concrete examples of -fconserve-stack giving an overall
> win that isn't in the noise? The fact that the GCC documentation
> explicitly states that enabling the option can lead to `making the
> program slower' does make me question why we're enabling it in the first
> place.

Because the kernel has a limited stack. We had a few cases in the past where
inlining blew it, especially in large ioctl switch() functions which inlined
lots of others.

On modern gccs it's better because it is smarter about sharing stack 
slots in large
functions. This was also worked around with manual noinlines.

But it's still far safer to tell gcc to conserve stack.

I consider the ARM gcc behaviour just a bug. The thing was really only 
intended
for the inliner (I asked for it originally)

-Andi

Patch
diff mbox

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index c7d321a..1c383d0 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -101,6 +101,10 @@  ifeq ($(CONFIG_ARM_UNWIND),y)
 CFLAGS_ABI	+=-funwind-tables
 endif
 
+ifeq ($(CONFIG_ALIGNMENT_TRAP),y)
+CFLAGS_ABI	+=$(call cc-option,-mno-unaligned-access,)
+endif
+
 ifeq ($(CONFIG_THUMB2_KERNEL),y)
 AFLAGS_AUTOIT	:=$(call as-option,-Wa$(comma)-mimplicit-it=always,-Wa$(comma)-mauto-it)
 AFLAGS_NOWARN	:=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)