Message ID | alpine.DEB.2.02.1307280538460.8385@utopia.booyaka.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Paul, Cheers for sticking with this! On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote: > > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: > don't flush icache in switch_mm with hardware broadcasting") breaks > the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an > undefined instruction abort from the CP15 read in > cache_ops_need_broadcast(). It turns out that early ARM1136 variants > don't support several CP15 registers that later ARM cores do. > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details. Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since the SMP_ON_UP magic should have caused is_smp() to return false. > diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h > index 6462a72..76214cb 100644 > --- a/arch/arm/include/asm/smp_plat.h > +++ b/arch/arm/include/asm/smp_plat.h > @@ -25,7 +25,6 @@ static inline bool is_smp(void) > #endif > } > > -/* all SMP configurations have the extended CPUID registers */ > #ifndef CONFIG_MMU > #define tlb_ops_need_broadcast() 0 > #else > @@ -43,6 +42,9 @@ static inline int tlb_ops_need_broadcast(void) > #else > static inline int cache_ops_need_broadcast(void) > { > + if (cpu_is_arm1136_r0()) > + return 0; > + > if (!is_smp()) > return 0; So we should have returned 0 here without exploding (this just reads a .globl initialised in head.S). Are we somehow misidentifying your 1136 as an SMP core? Will
On Sun, Jul 28, 2013 at 12:10:38PM +0100, Will Deacon wrote: > Hi Paul, > > Cheers for sticking with this! > > On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote: > > > > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: > > don't flush icache in switch_mm with hardware broadcasting") breaks > > the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an > > undefined instruction abort from the CP15 read in > > cache_ops_need_broadcast(). It turns out that early ARM1136 variants > > don't support several CP15 registers that later ARM cores do. > > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details. > > Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since > the SMP_ON_UP magic should have caused is_smp() to return false. The compiler seems to be doing something quite odd with schedule(): 00000540 <__schedule>: 558: ee103ff1 mrc 15, 0, r3, cr0, cr1, {7} 560: e1a03623 lsr r3, r3, #12 564: e203300f and r3, r3, #15 570: e50b3080 str r3, [fp, #-128] ; 0x80 ... 6e8: e59f3428 ldr r3, [pc, #1064] ; b18 <__schedule+0x5d8> 6f0: e5933000 ldr r3, [r3] 6f4: e3530000 cmp r3, #0 6f8: 1a000027 bne 79c <__schedule+0x25c> 6fc: e2851f65 add r1, r5, #404 ; 0x194 700: ebfffffe bl 0 <_test_and_set_bit> 700: R_ARM_CALL _test_and_set_bit ... 79c: e51b1080 ldr r1, [fp, #-128] ; 0x80 7a0: e3510000 cmp r1, #0 7a4: 1affffd4 bne 6fc <__schedule+0x1bc> b18: R_ARM_ABS32 smp_on_up Yes - that's right, it's reading from the mcr at the very start of __schedule and storing it on the stack for just one test later on. The act of storing it on the stack and reading it back of is likely a lot more expensive than reading it from CP15 in the first place! We don't want to make the asm() itself volatile, because that means the asm() statement can't be optimised away. Adding a memory clobber to that asm seems to place it more appropriately, but again, that's not particularly desirable. Another solution would be to make both tlb_ops_need_broadcast and cache_ops_need_broadcast both be a flag test, but that introduces a memory load in all those paths no matter if we're running on SMP_ON_UP or not.
Hi Will On Sun, 28 Jul 2013, Will Deacon wrote: > On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote: > > > > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: > > don't flush icache in switch_mm with hardware broadcasting") breaks > > the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an > > undefined instruction abort from the CP15 read in > > cache_ops_need_broadcast(). It turns out that early ARM1136 variants > > don't support several CP15 registers that later ARM cores do. > > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details. > > Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since > the SMP_ON_UP magic should have caused is_smp() to return false. > ... > So we should have returned 0 here without exploding (this just reads a > .globl initialised in head.S). Are we somehow misidentifying your 1136 > as an SMP core? You're right - the is_arm1136_r0() logic is completely unnecessary - it's just the 'asm volatile' that makes the difference. - Paul
On Sun, 28 Jul 2013, Russell King - ARM Linux wrote: > We don't want to make the asm() itself volatile, because that means the > asm() statement can't be optimised away. > > Adding a memory clobber to that asm seems to place it more appropriately, > but again, that's not particularly desirable. > > Another solution would be to make both tlb_ops_need_broadcast and > cache_ops_need_broadcast both be a flag test, but that introduces a > memory load in all those paths no matter if we're running on SMP_ON_UP > or not. I considered adding a HWCAP_* bit under the theory that elf_hwcap may already be cache-resident. But thought that the loss of the flag space and possible memory access vs. the additional mrc wasn't worth it. Anyway, if either of you have a preference between these three, I'd be happy to update the patch. Or feel free to take it over from here if either of you prefer. - Paul
diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h index 8c25dc4..91ccd77 100644 --- a/arch/arm/include/asm/cputype.h +++ b/arch/arm/include/asm/cputype.h @@ -76,6 +76,8 @@ #define ARM_CPU_XSCALE_ARCH_V2 0x4000 #define ARM_CPU_XSCALE_ARCH_V3 0x6000 +#define ARM_CPU_ARM1136R0 0x4107b360 + extern unsigned int processor_id; #ifdef CONFIG_CPU_CP15 @@ -89,10 +91,43 @@ extern unsigned int processor_id; __val; \ }) + +/* Workaround for missing CP15 registers on ARM1136 r0 */ +# if defined(CONFIG_CPU_V6) +/** + * cpu_is_arm1136_r0 - is the kernel running on an ARM1136 r0 core? + * + * Returns true if the kernel is running on an ARM1136 r0 core, or + * false otherwise. Callers use this to avoid undefined instruction + * aborts from CP15 accesses to registers not present on the r0 + * variant, or to detect whether certain CPU features are available. + * ARM1136JF-S TRM section 3.2.1 "Register allocation" + */ +static inline bool __attribute_const__ cpu_is_arm1136_r0(void) +{ + return ((read_cpuid(CPUID_ID) & 0xfffffff0) == ARM_CPU_ARM1136R0); +} + +/* + * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6, + * else the compiler may move it before any ARM1136r0 test. + */ +# define CPUID_EXT_REORDER volatile +# else +static inline bool __attribute_const__ cpu_is_arm1136_r0(void) { return false; } +# define CPUID_EXT_REORDER +# endif + +/* + * Early ARM1136 variants don't support many CP15 registers provided + * on later cores. Users of read_cpuid_ext must ensure that it won't + * be used unless it's known that the running core provides the CP15 + * register ext_reg. See cpu_is_arm1136_r0() above. + */ #define read_cpuid_ext(ext_reg) \ ({ \ unsigned int __val; \ - asm("mrc p15, 0, %0, c0, " ext_reg \ + asm CPUID_EXT_REORDER("mrc p15, 0, %0, c0, " ext_reg \ : "=r" (__val) \ : \ : "cc"); \ diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h index 6462a72..76214cb 100644 --- a/arch/arm/include/asm/smp_plat.h +++ b/arch/arm/include/asm/smp_plat.h @@ -25,7 +25,6 @@ static inline bool is_smp(void) #endif } -/* all SMP configurations have the extended CPUID registers */ #ifndef CONFIG_MMU #define tlb_ops_need_broadcast() 0 #else @@ -43,6 +42,9 @@ static inline int tlb_ops_need_broadcast(void) #else static inline int cache_ops_need_broadcast(void) { + if (cpu_is_arm1136_r0()) + return 0; + if (!is_smp()) return 0;
Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm: don't flush icache in switch_mm with hardware broadcasting") breaks the boot on OMAP2430SDP with omap2plus_defconfig. Tracked to an undefined instruction abort from the CP15 read in cache_ops_need_broadcast(). It turns out that early ARM1136 variants don't support several CP15 registers that later ARM cores do. ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details. So, prevent cache_ops_need_broadcast() from attempting the CP15 read if the running CPU doesn't provide the register. cache_ops_need_broadcast() is a hot path function, so focus on minimizing the execution time impact. A subsequent patch will take care of the remaining cases in the current kernel. Thanks to Will Deacon for helping track this down. Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Will Deacon <will.deacon@arm.com> --- Intended for v3.11-rc. Tested (along with the followup patch) here: http://www.pwsan.com/omap/testlogs/bisect_2430sdp_hang_v3.11-rc/20130727224434/README.txt against stock v3.11-rc2, so several boards aren't booting due to unrelated issues. arch/arm/include/asm/cputype.h | 37 ++++++++++++++++++++++++++++++++++++- arch/arm/include/asm/smp_plat.h | 4 +++- 2 files changed, 39 insertions(+), 2 deletions(-)