Message ID | 1306229953.19557.14.camel@e102109-lin.cambridge.arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Catalin Marinas <catalin.marinas@arm.com> writes: > On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote: >> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote: >> > Newer versions of gcc generate unaligned accesses by default, causing >> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the >> > -mno-unaligned-access option to gcc. >> >> This description doesn't make sense. If we have alignment traps enabled, >> then we _expect_ to fix up unaligned loads and stores. >> >> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled. >> >> So what's the _actual_ problem that this is trying to address? What's >> the panic/oops look like? And that information should be in the commit >> description _anyway_. > > Does the patch below look better? > > We cannot move alignment_init() earlier as we don't know how early the > compiler would generate unaligned accesses. An alternative is some > #ifdef's in head.S. Please let me know which variant you prefer. ifdefs may be ugly, but I don't see a better solution here. Crippling the entire build to make a couple of lines slightly more aesthetically pleasing doesn't seem right to me.
2011/5/24 Måns Rullgård <mans@mansr.com>: > Catalin Marinas <catalin.marinas@arm.com> writes: > >> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote: >>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote: >>> > Newer versions of gcc generate unaligned accesses by default, causing >>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the >>> > -mno-unaligned-access option to gcc. >>> >>> This description doesn't make sense. If we have alignment traps enabled, >>> then we _expect_ to fix up unaligned loads and stores. >>> >>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled. >>> >>> So what's the _actual_ problem that this is trying to address? What's >>> the panic/oops look like? And that information should be in the commit >>> description _anyway_. >> >> Does the patch below look better? >> >> We cannot move alignment_init() earlier as we don't know how early the >> compiler would generate unaligned accesses. An alternative is some >> #ifdef's in head.S. Please let me know which variant you prefer. > > ifdefs may be ugly, but I don't see a better solution here. Crippling > the entire build to make a couple of lines slightly more aesthetically > pleasing doesn't seem right to me. BTW, are we sure that the code generated with unaligned accesses is better? AFAIK, while processors support unaligned accesses, they may not always be optimal.
Catalin Marinas <catalin.marinas@arm.com> writes: > 2011/5/24 Måns Rullgård <mans@mansr.com>: >> Catalin Marinas <catalin.marinas@arm.com> writes: >> >>> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote: >>>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote: >>>> > Newer versions of gcc generate unaligned accesses by default, causing >>>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the >>>> > -mno-unaligned-access option to gcc. >>>> >>>> This description doesn't make sense. If we have alignment traps enabled, >>>> then we _expect_ to fix up unaligned loads and stores. >>>> >>>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled. >>>> >>>> So what's the _actual_ problem that this is trying to address? What's >>>> the panic/oops look like? And that information should be in the commit >>>> description _anyway_. >>> >>> Does the patch below look better? >>> >>> We cannot move alignment_init() earlier as we don't know how early the >>> compiler would generate unaligned accesses. An alternative is some >>> #ifdef's in head.S. Please let me know which variant you prefer. >> >> ifdefs may be ugly, but I don't see a better solution here. Crippling >> the entire build to make a couple of lines slightly more aesthetically >> pleasing doesn't seem right to me. > > BTW, are we sure that the code generated with unaligned accesses is > better? AFAIK, while processors support unaligned accesses, they may > not always be optimal. On Cortex-A8 unaligned loads not crossing a 128-bit boundary have no penalty. Crossing a 128-bit boundary stalls for 9 cycles (measured, 8 according to TRM). On Cortex-A9, unaligned loads crossing a 64-bit boundary delay 7 cycles (measured), other misalignments have no penalty. On average these timings are faster than doing 4 byte loads and oring them together, which uses 6 cycles on A9, 9 cycles on A8. It may of course be the case that unaligned accesses are rare enough that this isn't worth worrying about at all.
On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: > 2011/5/24 Måns Rullgård <mans@mansr.com>: > > Catalin Marinas <catalin.marinas@arm.com> writes: > > > >> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote: > >>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote: > >>> > Newer versions of gcc generate unaligned accesses by default, causing > >>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the > >>> > -mno-unaligned-access option to gcc. > >>> > >>> This description doesn't make sense. If we have alignment traps enabled, > >>> then we _expect_ to fix up unaligned loads and stores. > >>> > >>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled. > >>> > >>> So what's the _actual_ problem that this is trying to address? What's > >>> the panic/oops look like? And that information should be in the commit > >>> description _anyway_. > >> > >> Does the patch below look better? > >> > >> We cannot move alignment_init() earlier as we don't know how early the > >> compiler would generate unaligned accesses. An alternative is some > >> #ifdef's in head.S. Please let me know which variant you prefer. > > > > ifdefs may be ugly, but I don't see a better solution here. Crippling > > the entire build to make a couple of lines slightly more aesthetically > > pleasing doesn't seem right to me. > > BTW, are we sure that the code generated with unaligned accesses is > better? AFAIK, while processors support unaligned accesses, they may > not always be optimal. The code gcc generates to synthesise an unaligned access using aligned accesses is pretty simplistic: $ cat <<EOF | gcc -O2 -c unaligned.c && objdump -d unaligned.o unsigned long readw(void *p) { struct { unsigned long l; } __attribute__ (( __packed__ )) *s = p; return s->l; } EOF 00000000 <readw>: 0: 7841 ldrb r1, [r0, #1] 2: 7803 ldrb r3, [r0, #0] 4: 7882 ldrb r2, [r0, #2] 6: 78c0 ldrb r0, [r0, #3] 8: ea43 2301 orr.w r3, r3, r1, lsl #8 c: ea43 4302 orr.w r3, r3, r2, lsl #16 10: ea43 6000 orr.w r0, r3, r0, lsl #24 14: 4770 bx lr 16: bf00 nop For code which natively needs to read unaligned fields from data structures, I sincerely doubt that the CPU will not beat the above code for efficiency... So if there's code doing unaligned access to data structures for a good reason, building with unaligned access support turned on in the compiler seems a good idea, if that code might performance-critical for anything. Most code should not be doing unaligned accesses even at the source level though unless there's a good reason, since on average unaligned accesses will not be quite as efficient as aligned accesses even if performed natively by the CPU rather than being synthesised by the compiler. Where are the observed faults coming from? Maybe it's the faulting code that's the problem here, not the compiler... Cheers ---Dave
On Tue, 24 May 2011, Måns Rullgård wrote: > Catalin Marinas <catalin.marinas@arm.com> writes: > > > 2011/5/24 Måns Rullgård <mans@mansr.com>: > >> Catalin Marinas <catalin.marinas@arm.com> writes: > >> > >>> On Mon, 2011-05-23 at 14:21 +0100, Russell King - ARM Linux wrote: > >>>> On Mon, May 23, 2011 at 12:16:48PM +0100, Catalin Marinas wrote: > >>>> > Newer versions of gcc generate unaligned accesses by default, causing > >>>> > kernel panics when CONFIG_ALIGNMENT_TRAP is enabled. This patch adds the > >>>> > -mno-unaligned-access option to gcc. > >>>> > >>>> This description doesn't make sense. If we have alignment traps enabled, > >>>> then we _expect_ to fix up unaligned loads and stores. > >>>> > >>>> Therefore there should be no panic if CONFIG_ALIGNMENT_TRAP is enabled. > >>>> > >>>> So what's the _actual_ problem that this is trying to address? What's > >>>> the panic/oops look like? And that information should be in the commit > >>>> description _anyway_. > >>> > >>> Does the patch below look better? > >>> > >>> We cannot move alignment_init() earlier as we don't know how early the > >>> compiler would generate unaligned accesses. An alternative is some > >>> #ifdef's in head.S. Please let me know which variant you prefer. > >> > >> ifdefs may be ugly, but I don't see a better solution here. Crippling > >> the entire build to make a couple of lines slightly more aesthetically > >> pleasing doesn't seem right to me. > > > > BTW, are we sure that the code generated with unaligned accesses is > > better? AFAIK, while processors support unaligned accesses, they may > > not always be optimal. > > On Cortex-A8 unaligned loads not crossing a 128-bit boundary have no > penalty. Crossing a 128-bit boundary stalls for 9 cycles (measured, 8 > according to TRM). On Cortex-A9, unaligned loads crossing a 64-bit > boundary delay 7 cycles (measured), other misalignments have no penalty. > On average these timings are faster than doing 4 byte loads and oring > them together, which uses 6 cycles on A9, 9 cycles on A8. > > It may of course be the case that unaligned accesses are rare enough > that this isn't worth worrying about at all. I think this is largely the case in the kernel. If anything, unaligned accesses were never expected from the compiler before. Preserving that behavior would be preferable until we actually identify a use case where doing otherwise is beneficial. User space is a totally different thing of course. Nicolas
On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: > > BTW, are we sure that the code generated with unaligned accesses is > > better? AFAIK, while processors support unaligned accesses, they may > > not always be optimal. > > The code gcc generates to synthesise an unaligned access using aligned > accesses is pretty simplistic: ... > For code which natively needs to read unaligned fields from data structures, > I sincerely doubt that the CPU will not beat the above code for efficiency... > > So if there's code doing unaligned access to data structures for a good > reason, building with unaligned access support turned on in the compiler > seems a good idea, if that code might performance-critical for anything. gcc generates unaligned accesses in the the pcpu_dump_alloc_info() function. We have a local variable like below (9 bytes): char empty_str[] = "--------"; and it looks like other stack accesses are unaligned: c0082ba0 <pcpu_dump_alloc_info>: c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} c0082ba4: e3074118 movw r4, #28952 ; 0x7118 c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c c0082bac: e34c402a movt r4, #49194 ; 0xc02a c0082bb0: e58d1014 str r1, [sp, #20] c0082bb4: e58d0020 str r0, [sp, #32] c0082bb8: e8b40003 ldm r4!, {r0, r1} c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! c0082bc0: e59d0014 ldr r0, [sp, #20] c0082bc4: e5d43000 ldrb r3, [r4] I haven't tried with -mno-unaligned-access, I suspect the variables on the stack would be aligned.
On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: > On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: > > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: > > > BTW, are we sure that the code generated with unaligned accesses is > > > better? AFAIK, while processors support unaligned accesses, they may > > > not always be optimal. > > > > The code gcc generates to synthesise an unaligned access using aligned > > accesses is pretty simplistic: > ... > > For code which natively needs to read unaligned fields from data structures, > > I sincerely doubt that the CPU will not beat the above code for efficiency... > > > > So if there's code doing unaligned access to data structures for a good > > reason, building with unaligned access support turned on in the compiler > > seems a good idea, if that code might performance-critical for anything. > > gcc generates unaligned accesses in the the pcpu_dump_alloc_info() > function. We have a local variable like below (9 bytes): > > char empty_str[] = "--------"; > > and it looks like other stack accesses are unaligned: > > c0082ba0 <pcpu_dump_alloc_info>: > c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} > c0082ba4: e3074118 movw r4, #28952 ; 0x7118 > c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c > c0082bac: e34c402a movt r4, #49194 ; 0xc02a > c0082bb0: e58d1014 str r1, [sp, #20] > c0082bb4: e58d0020 str r0, [sp, #32] > c0082bb8: e8b40003 ldm r4!, {r0, r1} > c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! > c0082bc0: e59d0014 ldr r0, [sp, #20] > c0082bc4: e5d43000 ldrb r3, [r4] > > I haven't tried with -mno-unaligned-access, I suspect the variables on > the stack would be aligned. So, it looks like empty_str may be misaligned on the stack, and the compiler is doing a misaligned store when initialising it. Since the unaligned access support stuff is new, I'm suspicious of a compiler bug here... Can you follow up with your friendly neighbourhood tools guys? Cheers ---Dave
Dave Martin <dave.martin@linaro.org> writes: > On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: >> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: >> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: >> > > BTW, are we sure that the code generated with unaligned accesses is >> > > better? AFAIK, while processors support unaligned accesses, they may >> > > not always be optimal. >> > >> > The code gcc generates to synthesise an unaligned access using aligned >> > accesses is pretty simplistic: >> ... >> > For code which natively needs to read unaligned fields from data structures, >> > I sincerely doubt that the CPU will not beat the above code for efficiency... >> > >> > So if there's code doing unaligned access to data structures for a good >> > reason, building with unaligned access support turned on in the compiler >> > seems a good idea, if that code might performance-critical for anything. >> >> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() >> function. We have a local variable like below (9 bytes): >> >> char empty_str[] = "--------"; >> >> and it looks like other stack accesses are unaligned: >> >> c0082ba0 <pcpu_dump_alloc_info>: >> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} >> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 >> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c >> c0082bac: e34c402a movt r4, #49194 ; 0xc02a >> c0082bb0: e58d1014 str r1, [sp, #20] >> c0082bb4: e58d0020 str r0, [sp, #32] >> c0082bb8: e8b40003 ldm r4!, {r0, r1} >> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! >> c0082bc0: e59d0014 ldr r0, [sp, #20] >> c0082bc4: e5d43000 ldrb r3, [r4] >> >> I haven't tried with -mno-unaligned-access, I suspect the variables on >> the stack would be aligned. > > So, it looks like empty_str may be misaligned on the stack, and the compiler > is doing a misaligned store when initialising it. empty_str has type char[] so there are no alignment requirements. > Since the unaligned access support stuff is new, I'm suspicious of a > compiler bug here... Can you follow up with your friendly neighbourhood > tools guys? The compiler might be doing something unintentional, but I see no violation of any rules here.
On Wed, May 25, 2011 at 02:32:13PM +0100, Måns Rullgård wrote: > Dave Martin <dave.martin@linaro.org> writes: > > > On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: > >> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: > >> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: > >> > > BTW, are we sure that the code generated with unaligned accesses is > >> > > better? AFAIK, while processors support unaligned accesses, they may > >> > > not always be optimal. > >> > > >> > The code gcc generates to synthesise an unaligned access using aligned > >> > accesses is pretty simplistic: > >> ... > >> > For code which natively needs to read unaligned fields from data structures, > >> > I sincerely doubt that the CPU will not beat the above code for efficiency... > >> > > >> > So if there's code doing unaligned access to data structures for a good > >> > reason, building with unaligned access support turned on in the compiler > >> > seems a good idea, if that code might performance-critical for anything. > >> > >> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() > >> function. We have a local variable like below (9 bytes): > >> > >> char empty_str[] = "--------"; > >> > >> and it looks like other stack accesses are unaligned: > >> > >> c0082ba0 <pcpu_dump_alloc_info>: > >> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} > >> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 > >> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c > >> c0082bac: e34c402a movt r4, #49194 ; 0xc02a > >> c0082bb0: e58d1014 str r1, [sp, #20] > >> c0082bb4: e58d0020 str r0, [sp, #32] > >> c0082bb8: e8b40003 ldm r4!, {r0, r1} > >> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! > >> c0082bc0: e59d0014 ldr r0, [sp, #20] > >> c0082bc4: e5d43000 ldrb r3, [r4] > >> > >> I haven't tried with -mno-unaligned-access, I suspect the variables on > >> the stack would be aligned. > > > > So, it looks like empty_str may be misaligned on the stack, and the compiler > > is doing a misaligned store when initialising it. > > empty_str has type char[] so there are no alignment requirements. > > > Since the unaligned access support stuff is new, I'm suspicious of a > > compiler bug here... Can you follow up with your friendly neighbourhood > > tools guys? > > The compiler might be doing something unintentional, but I see no > violation of any rules here. Indeed -- mainly I'm wondering if this specific problem (unaligned accesses generated by the compiler for normal C code) is one we will ever need to deal with for kernel code. I'd guess that the performance of unaligned accesses will always be at least slightly suboptimal in practice, so the compiler guys may well not intend to misalign data just for the sake of it. The answer might be yes or no; it depends on the compiler guys. (By "normal C code" I exclude things like the __packed__ attribute, inline asm, and and type-punning activities forbidden by the C language standards. By nature, these will affect specific parts of the kernel rather than being universal.) ---Dave
Dave Martin <dave.martin@linaro.org> writes: > On Wed, May 25, 2011 at 02:32:13PM +0100, Måns Rullgård wrote: >> Dave Martin <dave.martin@linaro.org> writes: >> >> > On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: >> >> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: >> >> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: >> >> > > BTW, are we sure that the code generated with unaligned accesses is >> >> > > better? AFAIK, while processors support unaligned accesses, they may >> >> > > not always be optimal. >> >> > >> >> > The code gcc generates to synthesise an unaligned access using aligned >> >> > accesses is pretty simplistic: >> >> ... >> >> > For code which natively needs to read unaligned fields from data structures, >> >> > I sincerely doubt that the CPU will not beat the above code for efficiency... >> >> > >> >> > So if there's code doing unaligned access to data structures for a good >> >> > reason, building with unaligned access support turned on in the compiler >> >> > seems a good idea, if that code might performance-critical for anything. >> >> >> >> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() >> >> function. We have a local variable like below (9 bytes): >> >> >> >> char empty_str[] = "--------"; >> >> >> >> and it looks like other stack accesses are unaligned: >> >> >> >> c0082ba0 <pcpu_dump_alloc_info>: >> >> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} >> >> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 >> >> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c >> >> c0082bac: e34c402a movt r4, #49194 ; 0xc02a >> >> c0082bb0: e58d1014 str r1, [sp, #20] >> >> c0082bb4: e58d0020 str r0, [sp, #32] >> >> c0082bb8: e8b40003 ldm r4!, {r0, r1} >> >> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! >> >> c0082bc0: e59d0014 ldr r0, [sp, #20] >> >> c0082bc4: e5d43000 ldrb r3, [r4] >> >> >> >> I haven't tried with -mno-unaligned-access, I suspect the variables on >> >> the stack would be aligned. >> > >> > So, it looks like empty_str may be misaligned on the stack, and the compiler >> > is doing a misaligned store when initialising it. >> >> empty_str has type char[] so there are no alignment requirements. >> >> > Since the unaligned access support stuff is new, I'm suspicious of a >> > compiler bug here... Can you follow up with your friendly neighbourhood >> > tools guys? >> >> The compiler might be doing something unintentional, but I see no >> violation of any rules here. > > Indeed -- mainly I'm wondering if this specific problem (unaligned > accesses generated by the compiler for normal C code) is one we will > ever need to deal with for kernel code. I'd guess that the performance > of unaligned accesses will always be at least slightly suboptimal in > practice, so the compiler guys may well not intend to misalign data > just for the sake of it. The only case where not aligning data on stack could make any sense at all is if there are multiple odd-sized char (or short) arrays, in which case packing them together would save a few bytes, and even this is hardly worth the potential cost. If gcc is indeed misaligning arrays on the stack, someone should have a word with the compiler people and get them to fix it.
2011/5/25 Måns Rullgård <mans@mansr.com>: > Dave Martin <dave.martin@linaro.org> writes: > >> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: >>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: >>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: >>> > > BTW, are we sure that the code generated with unaligned accesses is >>> > > better? AFAIK, while processors support unaligned accesses, they may >>> > > not always be optimal. >>> > >>> > The code gcc generates to synthesise an unaligned access using aligned >>> > accesses is pretty simplistic: >>> ... >>> > For code which natively needs to read unaligned fields from data structures, >>> > I sincerely doubt that the CPU will not beat the above code for efficiency... >>> > >>> > So if there's code doing unaligned access to data structures for a good >>> > reason, building with unaligned access support turned on in the compiler >>> > seems a good idea, if that code might performance-critical for anything. >>> >>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() >>> function. We have a local variable like below (9 bytes): >>> >>> char empty_str[] = "--------"; >>> >>> and it looks like other stack accesses are unaligned: >>> >>> c0082ba0 <pcpu_dump_alloc_info>: >>> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} >>> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 >>> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c >>> c0082bac: e34c402a movt r4, #49194 ; 0xc02a >>> c0082bb0: e58d1014 str r1, [sp, #20] >>> c0082bb4: e58d0020 str r0, [sp, #32] >>> c0082bb8: e8b40003 ldm r4!, {r0, r1} >>> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! >>> c0082bc0: e59d0014 ldr r0, [sp, #20] >>> c0082bc4: e5d43000 ldrb r3, [r4] >>> >>> I haven't tried with -mno-unaligned-access, I suspect the variables on >>> the stack would be aligned. >> >> So, it looks like empty_str may be misaligned on the stack, and the compiler >> is doing a misaligned store when initialising it. > > empty_str has type char[] so there are no alignment requirements. I think the local variables after char empty_str[] are unaligned (int alloc). Changing the array size to 16 solves the issue. The gcc guys here in ARM will have a look and I'll get back to you.
Catalin Marinas <catalin.marinas@arm.com> writes: > 2011/5/25 Måns Rullgård <mans@mansr.com>: >> Dave Martin <dave.martin@linaro.org> writes: >> >>> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: >>>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: >>>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: >>>> > > BTW, are we sure that the code generated with unaligned accesses is >>>> > > better? AFAIK, while processors support unaligned accesses, they may >>>> > > not always be optimal. >>>> > >>>> > The code gcc generates to synthesise an unaligned access using aligned >>>> > accesses is pretty simplistic: >>>> ... >>>> > For code which natively needs to read unaligned fields from data structures, >>>> > I sincerely doubt that the CPU will not beat the above code for efficiency... >>>> > >>>> > So if there's code doing unaligned access to data structures for a good >>>> > reason, building with unaligned access support turned on in the compiler >>>> > seems a good idea, if that code might performance-critical for anything. >>>> >>>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() >>>> function. We have a local variable like below (9 bytes): >>>> >>>> char empty_str[] = "--------"; >>>> >>>> and it looks like other stack accesses are unaligned: >>>> >>>> c0082ba0 <pcpu_dump_alloc_info>: >>>> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} >>>> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 >>>> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c >>>> c0082bac: e34c402a movt r4, #49194 ; 0xc02a >>>> c0082bb0: e58d1014 str r1, [sp, #20] >>>> c0082bb4: e58d0020 str r0, [sp, #32] >>>> c0082bb8: e8b40003 ldm r4!, {r0, r1} >>>> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! >>>> c0082bc0: e59d0014 ldr r0, [sp, #20] >>>> c0082bc4: e5d43000 ldrb r3, [r4] >>>> >>>> I haven't tried with -mno-unaligned-access, I suspect the variables on >>>> the stack would be aligned. >>> >>> So, it looks like empty_str may be misaligned on the stack, and the compiler >>> is doing a misaligned store when initialising it. >> >> empty_str has type char[] so there are no alignment requirements. > > I think the local variables after char empty_str[] are unaligned (int > alloc). That definitely sounds like a bug. Does passing the address of such a variable somewhere make a difference? Some ABI rules only apply to actual pointers.
Right... [adding bunch of people to CC], On Wed, 2011-05-25 at 15:50 +0100, Catalin Marinas wrote: > 2011/5/25 Måns Rullgård <mans@mansr.com>: > > Dave Martin <dave.martin@linaro.org> writes: > > > >> On Wed, May 25, 2011 at 12:14:08PM +0100, Catalin Marinas wrote: > >>> On Tue, May 24, 2011 at 06:13:31PM +0100, Dave Martin wrote: > >>> > On Tue, May 24, 2011 at 04:26:35PM +0100, Catalin Marinas wrote: > >>> > > BTW, are we sure that the code generated with unaligned accesses is > >>> > > better? AFAIK, while processors support unaligned accesses, they may > >>> > > not always be optimal. > >>> > > >>> > The code gcc generates to synthesise an unaligned access using aligned > >>> > accesses is pretty simplistic: > >>> ... > >>> > For code which natively needs to read unaligned fields from data structures, > >>> > I sincerely doubt that the CPU will not beat the above code for efficiency... > >>> > > >>> > So if there's code doing unaligned access to data structures for a good > >>> > reason, building with unaligned access support turned on in the compiler > >>> > seems a good idea, if that code might performance-critical for anything. > >>> > >>> gcc generates unaligned accesses in the the pcpu_dump_alloc_info() > >>> function. We have a local variable like below (9 bytes): > >>> > >>> char empty_str[] = "--------"; > >>> > >>> and it looks like other stack accesses are unaligned: > >>> > >>> c0082ba0 <pcpu_dump_alloc_info>: > >>> c0082ba0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr} > >>> c0082ba4: e3074118 movw r4, #28952 ; 0x7118 > >>> c0082ba8: e24dd04c sub sp, sp, #76 ; 0x4c > >>> c0082bac: e34c402a movt r4, #49194 ; 0xc02a > >>> c0082bb0: e58d1014 str r1, [sp, #20] > >>> c0082bb4: e58d0020 str r0, [sp, #32] > >>> c0082bb8: e8b40003 ldm r4!, {r0, r1} > >>> c0082bbc: e58d003f str r0, [sp, #63] <----------- !!!!! > >>> c0082bc0: e59d0014 ldr r0, [sp, #20] > >>> c0082bc4: e5d43000 ldrb r3, [r4] > >>> > >>> I haven't tried with -mno-unaligned-access, I suspect the variables on > >>> the stack would be aligned. > >> > >> So, it looks like empty_str may be misaligned on the stack, and the compiler > >> is doing a misaligned store when initialising it. > > > > empty_str has type char[] so there are no alignment requirements. > > I think the local variables after char empty_str[] are unaligned (int > alloc). Changing the array size to 16 solves the issue. > > The gcc guys here in ARM will have a look and I'll get back to you. This issue seems to be caused by passing -fconserve-stack to GCC. This was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc 4.5") and as you can see from the archive: http://lkml.org/lkml/2009/9/20/39 it was thought to only have an impact on inlining decisions. Looking at the documentation for GCC 4.6: -fconserve-stack Attempt to minimize stack usage. The compiler will attempt to use less stack space, even if that makes the program slower. This option implies setting the ‘large-stack-frame’ parameter to 100 and the ‘large-stack-frame-growth’ parameter to 400. So it sounds like we might not want to enable this blindly across all architectures. Indeed, on ARM, it encourages the compiler to pack variables on the stack which leads to the weird and wonderful alignment situation that has been encountered in this thread. Can we remove -fconserve-stack from the top-level Makefile (or at least make it conditional by architecture)? Will
Will Deacon <will.deacon@arm.com> writes: > This issue seems to be caused by passing -fconserve-stack to GCC. This > was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc > 4.5") and as you can see from the archive: > > http://lkml.org/lkml/2009/9/20/39 > > it was thought to only have an impact on inlining decisions. Looking at > the documentation for GCC 4.6: > > -fconserve-stack > Attempt to minimize stack usage. The compiler will attempt to > use less stack space, even if that makes the program slower. This option > implies setting the ‘large-stack-frame’ parameter to 100 and the > ‘large-stack-frame-growth’ parameter to 400. > > So it sounds like we might not want to enable this blindly across all > architectures. Indeed, on ARM, it encourages the compiler to pack > variables on the stack which leads to the weird and wonderful alignment > situation that has been encountered in this thread. > > Can we remove -fconserve-stack from the top-level Makefile (or at least > make it conditional by architecture)? Sounds like a good idea to me.
> So it sounds like we might not want to enable this blindly across all > architectures. Indeed, on ARM, it encourages the compiler to pack > variables on the stack which leads to the weird and wonderful alignment > situation that has been encountered in this thread. Interesting. I'm pretty sure it didn't do that when I added the flag. Anyways making it a CONFIG is fine for me. Just don't set it on ARM. It should be set on x86 at least. -Andi
On Thu, 26 May 2011, Will Deacon wrote: > This issue seems to be caused by passing -fconserve-stack to GCC. This > was added in 8f7f5c9f ("kbuild: set -fconserve-stack option for gcc > 4.5") and as you can see from the archive: > > http://lkml.org/lkml/2009/9/20/39 > > it was thought to only have an impact on inlining decisions. Looking at > the documentation for GCC 4.6: > > -fconserve-stack > Attempt to minimize stack usage. The compiler will attempt to > use less stack space, even if that makes the program slower. This option > implies setting the ‘large-stack-frame’ parameter to 100 and the > ‘large-stack-frame-growth’ parameter to 400. > > So it sounds like we might not want to enable this blindly across all > architectures. Indeed, on ARM, it encourages the compiler to pack > variables on the stack which leads to the weird and wonderful alignment > situation that has been encountered in this thread. > > Can we remove -fconserve-stack from the top-level Makefile (or at least > make it conditional by architecture)? I think this is an orthogonal issue. My opinion is that we should use -mno-unaligned-access by default on ARM. The reason is that we've been expecting the compiler not to cause unaligned accesses for ages, and letting the compiler, for whatever reasons including things like -fconserve-stack, produce unaligned accesses behind our back is a change in behavior we might not always be prepared for. Unaligned accesses in the kernel should be rare anyway, and allowing the compiler to generate them can be allowed for selected files when proven beneficial. It is possible that -fconserve-stack is still valuable on ARM given that it is also used with -mno-unaligned-access for other things than structure packing on the stack, and therefore its merits can be debated independently from the alignment issue at hand. Nicolas
> It is possible that -fconserve-stack is still valuable on ARM given that > it is also used with -mno-unaligned-access for other things than > structure packing on the stack, and therefore its merits can be debated > independently from the alignment issue at hand. The big advantage of -fconserve-stack is that it throttles the inliner if the inlining would cause too much stack growth. This is something you likely want on ARM too, especially as code gets more and more complex. -Andi
Andi Kleen <ak@linux.intel.com> writes: >> It is possible that -fconserve-stack is still valuable on ARM given that >> it is also used with -mno-unaligned-access for other things than >> structure packing on the stack, and therefore its merits can be debated >> independently from the alignment issue at hand. > > The big advantage of -fconserve-stack is that it throttles the inliner > if the inlining would cause too much stack growth. This is something > you likely want on ARM too, especially as code gets more and more > complex. Is there no way to get that effect without also activating the aggressive packing?
On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote: > It is possible that -fconserve-stack is still valuable on ARM given that > it is also used with -mno-unaligned-access for other things than > structure packing on the stack, and therefore its merits can be debated > independently from the alignment issue at hand. Catalin said in his mail "I haven't tried with -mno-unaligned-access, I suspect the variables on the stack would be aligned.". So I don't think we know enough to say whether -mno-unaligned-access avoids the stack packing.
> Catalin said in his mail "I haven't tried with -mno-unaligned-access, I > suspect the variables on the stack would be aligned.". So I don't think > we know enough to say whether -mno-unaligned-access avoids the stack > packing. It won't, the arm gcc code just checks flag_conserve_stack. IMHO it's just a gcc bug. You should report it to http://gcc.gnu.org/bugzilla As a temporary workaround you can disable it in the kernel too, but as soon as the compiler it's fixed I would recommend considering to reenable it. -Andi
On Thu, May 26, 2011 at 10:51:01PM +0100, Russell King - ARM Linux wrote: > On Thu, May 26, 2011 at 05:03:39PM -0400, Nicolas Pitre wrote: > > It is possible that -fconserve-stack is still valuable on ARM given that > > it is also used with -mno-unaligned-access for other things than > > structure packing on the stack, and therefore its merits can be debated > > independently from the alignment issue at hand. > > Catalin said in his mail "I haven't tried with -mno-unaligned-access, I > suspect the variables on the stack would be aligned.". So I don't think > we know enough to say whether -mno-unaligned-access avoids the stack > packing. OK, I tried this now: -fconserve-stack: we get unaligned accesses on the stack because the newer versions of gcc turned unaligned accesses on by default. -fconserve-stack -mno-unaligned-access: the stack variables are aligned. We probably get the benefit of -fconserve-stack as well. So as per the initial post in this thread, we could have -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As Nicolas suggested, we could compile some files with -munaligned-access (and maybe -fno-conserve-stack). I raised this with the gcc guys so they are looking into it. But it really doesn't look like a gcc bug as long as -mno-unaligned-access is taken into account.
On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote: > OK, I tried this now: > > -fconserve-stack: we get unaligned accesses on the stack because the > newer versions of gcc turned unaligned accesses on by default. > > -fconserve-stack -mno-unaligned-access: the stack variables are aligned. > We probably get the benefit of -fconserve-stack as well. > > So as per the initial post in this thread, we could have > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As > Nicolas suggested, we could compile some files with -munaligned-access > (and maybe -fno-conserve-stack). > > I raised this with the gcc guys so they are looking into it. But it > really doesn't look like a gcc bug as long as -mno-unaligned-access is > taken into account. Ok, we need to check one last thing, and that's what the behaviour is with -mno-unaligned-access and packed structures (such as the ethernet header). If it makes no difference, then I suggest we always build with -mno-unaligned-access.
On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote: > On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote: > > OK, I tried this now: > > > > -fconserve-stack: we get unaligned accesses on the stack because the > > newer versions of gcc turned unaligned accesses on by default. > > > > -fconserve-stack -mno-unaligned-access: the stack variables are aligned. > > We probably get the benefit of -fconserve-stack as well. > > > > So as per the initial post in this thread, we could have > > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As > > Nicolas suggested, we could compile some files with -munaligned-access > > (and maybe -fno-conserve-stack). > > > > I raised this with the gcc guys so they are looking into it. But it > > really doesn't look like a gcc bug as long as -mno-unaligned-access is > > taken into account. > > Ok, we need to check one last thing, and that's what the behaviour is > with -mno-unaligned-access and packed structures (such as the ethernet > header). If it makes no difference, then I suggest we always build > with -mno-unaligned-access. I tried some simple code below: struct test { unsigned char a[6]; unsigned long b; } __attribute__((packed)); void set(struct test *t, unsigned long v) { t->b = v; } int main(void) { struct test t; set(&t, 10); return 0; } With -mno-unaligned-access in newer toolchains, the set() function looks like this (compiled with -march=armv7): 00000000 <set>: 0: e7e7c451 ubfx ip, r1, #8, #8 4: e7e72851 ubfx r2, r1, #16, #8 8: e1a03c21 lsr r3, r1, #24 c: e5c01006 strb r1, [r0, #6] 10: e5c0c007 strb ip, [r0, #7] 14: e5c02008 strb r2, [r0, #8] 18: e5c03009 strb r3, [r0, #9] 1c: e12fff1e bx lr If I don't pass -mno-unaligned-access later toolchains use unaligned accesses by default and the set() function is more efficient: 00000000 <set>: 0: e5801006 str r1, [r0, #6] 4: e12fff1e bx lr The problem is that in addition to that we also get unaligned stack variables which are not really efficient. Either way we have a drawback somewhere. We could argue that -fconserve-stack is badly implemented on ARM.
On Fri, 2011-05-27 at 10:51 +0100, Catalin Marinas wrote: > On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote: > > On Fri, May 27, 2011 at 09:38:08AM +0100, Catalin Marinas wrote: > > > OK, I tried this now: > > > > > > -fconserve-stack: we get unaligned accesses on the stack because the > > > newer versions of gcc turned unaligned accesses on by default. > > > > > > -fconserve-stack -mno-unaligned-access: the stack variables are aligned. > > > We probably get the benefit of -fconserve-stack as well. > > > > > > So as per the initial post in this thread, we could have > > > -mno-unaligned-access on ARM always on (when CONFIG_ALIGNMENT_TRAP). As > > > Nicolas suggested, we could compile some files with -munaligned-access > > > (and maybe -fno-conserve-stack). > > > > > > I raised this with the gcc guys so they are looking into it. But it > > > really doesn't look like a gcc bug as long as -mno-unaligned-access is > > > taken into account. > > > > Ok, we need to check one last thing, and that's what the behaviour is > > with -mno-unaligned-access and packed structures (such as the ethernet > > header). If it makes no difference, then I suggest we always build > > with -mno-unaligned-access. > > I tried some simple code below: > > struct test { > unsigned char a[6]; > unsigned long b; > } __attribute__((packed)); > > void set(struct test *t, unsigned long v) > { > t->b = v; > } > > int main(void) > { > struct test t; > > set(&t, 10); > > return 0; > } > > With -mno-unaligned-access in newer toolchains, the set() function looks > like this (compiled with -march=armv7): > > 00000000 <set>: > 0: e7e7c451 ubfx ip, r1, #8, #8 > 4: e7e72851 ubfx r2, r1, #16, #8 > 8: e1a03c21 lsr r3, r1, #24 > c: e5c01006 strb r1, [r0, #6] > 10: e5c0c007 strb ip, [r0, #7] > 14: e5c02008 strb r2, [r0, #8] > 18: e5c03009 strb r3, [r0, #9] > 1c: e12fff1e bx lr > > If I don't pass -mno-unaligned-access later toolchains use unaligned > accesses by default and the set() function is more efficient: > > 00000000 <set>: > 0: e5801006 str r1, [r0, #6] > 4: e12fff1e bx lr For completeness, I tried with "unsigned short b" in the structure above hoping that the compiler would notice that it is 16-bit aligned. Unfortunately, it doesn't. Code below with -mno-unaligned-access: 00000000 <set>: 0: e1a03421 lsr r3, r1, #8 4: e5c01006 strb r1, [r0, #6] 8: e5c03007 strb r3, [r0, #7] c: e12fff1e bx lr
Hi Andi, On Thu, 2011-05-26 at 22:10 +0100, Andi Kleen wrote: > > > It is possible that -fconserve-stack is still valuable on ARM given that > > it is also used with -mno-unaligned-access for other things than > > structure packing on the stack, and therefore its merits can be debated > > independently from the alignment issue at hand. > > The big advantage of -fconserve-stack is that it throttles the inliner > if the inlining > would cause too much stack growth. This is something you likely want > on ARM too, especially as code gets more and more complex. Do you have any concrete examples of -fconserve-stack giving an overall win that isn't in the noise? The fact that the GCC documentation explicitly states that enabling the option can lead to `making the program slower' does make me question why we're enabling it in the first place. >From private conversation, the GCC guys don't seem to think this is a bug so I'm reluctant to open a bugzilla ticket. Will
Catalin Marinas <catalin.marinas@arm.com> writes: > On Fri, May 27, 2011 at 09:54:14AM +0100, Russell King - ARM Linux wrote: >> >> Ok, we need to check one last thing, and that's what the behaviour is >> with -mno-unaligned-access and packed structures (such as the ethernet >> header). If it makes no difference, then I suggest we always build >> with -mno-unaligned-access. > > I tried some simple code below: > > struct test { > unsigned char a[6]; > unsigned long b; > } __attribute__((packed)); > > void set(struct test *t, unsigned long v) > { > t->b = v; > } > > int main(void) > { > struct test t; > > set(&t, 10); > > return 0; > } > > With -mno-unaligned-access in newer toolchains, the set() function looks > like this (compiled with -march=armv7): > > 00000000 <set>: > 0: e7e7c451 ubfx ip, r1, #8, #8 > 4: e7e72851 ubfx r2, r1, #16, #8 > 8: e1a03c21 lsr r3, r1, #24 > c: e5c01006 strb r1, [r0, #6] > 10: e5c0c007 strb ip, [r0, #7] > 14: e5c02008 strb r2, [r0, #8] > 18: e5c03009 strb r3, [r0, #9] > 1c: e12fff1e bx lr > > If I don't pass -mno-unaligned-access later toolchains use unaligned > accesses by default and the set() function is more efficient: > > 00000000 <set>: > 0: e5801006 str r1, [r0, #6] > 4: e12fff1e bx lr This is certainly something we should want. Although some people expressed concerns over introducing unaligned accesses where there were previously none, I don't see how this could pose a problem as long as we make sure strict alignment checking is off. Some basic testing of code paths known to use unaligned accesses should suffice IMO. > The problem is that in addition to that we also get unaligned stack > variables which are not really efficient. Either way we have a drawback > somewhere. We could argue that -fconserve-stack is badly implemented on > ARM. Unless someone can demonstrate a clear win from -fconserve-stack, I think it's pretty obvious that this flag does more harm than good on ARM, especially in conjunction with unaligned accesses being allowed. If the stack packing could be disabled while retaining the other (presumably beneficial) effects of -fconserve-stack, it might be reconsidered.
> Do you have any concrete examples of -fconserve-stack giving an overall > win that isn't in the noise? The fact that the GCC documentation > explicitly states that enabling the option can lead to `making the > program slower' does make me question why we're enabling it in the first > place. Because the kernel has a limited stack. We had a few cases in the past where inlining blew it, especially in large ioctl switch() functions which inlined lots of others. On modern gccs it's better because it is smarter about sharing stack slots in large functions. This was also worked around with manual noinlines. But it's still far safer to tell gcc to conserve stack. I consider the ARM gcc behaviour just a bug. The thing was really only intended for the inliner (I asked for it originally) -Andi
diff --git a/arch/arm/Makefile b/arch/arm/Makefile index c7d321a..1c383d0 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -101,6 +101,10 @@ ifeq ($(CONFIG_ARM_UNWIND),y) CFLAGS_ABI +=-funwind-tables endif +ifeq ($(CONFIG_ALIGNMENT_TRAP),y) +CFLAGS_ABI +=$(call cc-option,-mno-unaligned-access,) +endif + ifeq ($(CONFIG_THUMB2_KERNEL),y) AFLAGS_AUTOIT :=$(call as-option,-Wa$(comma)-mimplicit-it=always,-Wa$(comma)-mauto-it) AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)