diff mbox

ARM: v6: avoid read_cpuid_ext() on ARM1136r0 in cache_ops_need_broadcast()

Message ID alpine.DEB.2.02.1307280538460.8385@utopia.booyaka.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Walmsley July 28, 2013, 5:43 a.m. UTC
Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
don't flush icache in switch_mm with hardware broadcasting") breaks
the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
undefined instruction abort from the CP15 read in
cache_ops_need_broadcast().  It turns out that early ARM1136 variants
don't support several CP15 registers that later ARM cores do.
ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details.

So, prevent cache_ops_need_broadcast() from attempting the CP15 read
if the running CPU doesn't provide the register.
cache_ops_need_broadcast() is a hot path function, so focus on
minimizing the execution time impact.  A subsequent patch will take
care of the remaining cases in the current kernel.

Thanks to Will Deacon for helping track this down.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Will Deacon <will.deacon@arm.com>
---

Intended for v3.11-rc.

Tested (along with the followup patch) here:

http://www.pwsan.com/omap/testlogs/bisect_2430sdp_hang_v3.11-rc/20130727224434/README.txt

against stock v3.11-rc2, so several boards aren't booting due to unrelated 
issues.

 arch/arm/include/asm/cputype.h  | 37 ++++++++++++++++++++++++++++++++++++-
 arch/arm/include/asm/smp_plat.h |  4 +++-
 2 files changed, 39 insertions(+), 2 deletions(-)

Comments

Will Deacon July 28, 2013, 11:10 a.m. UTC | #1
Hi Paul,

Cheers for sticking with this!

On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote:
> 
> Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> don't flush icache in switch_mm with hardware broadcasting") breaks
> the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> undefined instruction abort from the CP15 read in
> cache_ops_need_broadcast().  It turns out that early ARM1136 variants
> don't support several CP15 registers that later ARM cores do.
> ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details.

Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since
the SMP_ON_UP magic should have caused is_smp() to return false.

> diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
> index 6462a72..76214cb 100644
> --- a/arch/arm/include/asm/smp_plat.h
> +++ b/arch/arm/include/asm/smp_plat.h
> @@ -25,7 +25,6 @@ static inline bool is_smp(void)
>  #endif
>  }
>  
> -/* all SMP configurations have the extended CPUID registers */
>  #ifndef CONFIG_MMU
>  #define tlb_ops_need_broadcast()	0
>  #else
> @@ -43,6 +42,9 @@ static inline int tlb_ops_need_broadcast(void)
>  #else
>  static inline int cache_ops_need_broadcast(void)
>  {
> +	if (cpu_is_arm1136_r0())
> +		return 0;
> +
>  	if (!is_smp())
>  		return 0;

So we should have returned 0 here without exploding (this just reads a
.globl initialised in head.S). Are we somehow misidentifying your 1136
as an SMP core?

Will
Russell King - ARM Linux July 28, 2013, 11:52 a.m. UTC | #2
On Sun, Jul 28, 2013 at 12:10:38PM +0100, Will Deacon wrote:
> Hi Paul,
> 
> Cheers for sticking with this!
> 
> On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote:
> > 
> > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> > don't flush icache in switch_mm with hardware broadcasting") breaks
> > the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> > undefined instruction abort from the CP15 read in
> > cache_ops_need_broadcast().  It turns out that early ARM1136 variants
> > don't support several CP15 registers that later ARM cores do.
> > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details.
> 
> Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since
> the SMP_ON_UP magic should have caused is_smp() to return false.

The compiler seems to be doing something quite odd with schedule():

00000540 <__schedule>:
 558:   ee103ff1        mrc     15, 0, r3, cr0, cr1, {7}
 560:   e1a03623        lsr     r3, r3, #12
 564:   e203300f        and     r3, r3, #15
 570:   e50b3080        str     r3, [fp, #-128] ; 0x80
...
 6e8:   e59f3428        ldr     r3, [pc, #1064] ; b18 <__schedule+0x5d8>
 6f0:   e5933000        ldr     r3, [r3]
 6f4:   e3530000        cmp     r3, #0
 6f8:   1a000027        bne     79c <__schedule+0x25c>
 6fc:   e2851f65        add     r1, r5, #404    ; 0x194
 700:   ebfffffe        bl      0 <_test_and_set_bit>
                        700: R_ARM_CALL _test_and_set_bit
...
 79c:   e51b1080        ldr     r1, [fp, #-128] ; 0x80
 7a0:   e3510000        cmp     r1, #0
 7a4:   1affffd4        bne     6fc <__schedule+0x1bc>
                        b18: R_ARM_ABS32        smp_on_up

Yes - that's right, it's reading from the mcr at the very start of
__schedule and storing it on the stack for just one test later on.  The
act of storing it on the stack and reading it back of is likely a lot
more expensive than reading it from CP15 in the first place!

We don't want to make the asm() itself volatile, because that means the
asm() statement can't be optimised away.

Adding a memory clobber to that asm seems to place it more appropriately,
but again, that's not particularly desirable.

Another solution would be to make both tlb_ops_need_broadcast and
cache_ops_need_broadcast both be a flag test, but that introduces a
memory load in all those paths no matter if we're running on SMP_ON_UP
or not.
Paul Walmsley July 28, 2013, 7:47 p.m. UTC | #3
Hi Will

On Sun, 28 Jul 2013, Will Deacon wrote:

> On Sun, Jul 28, 2013 at 06:43:24AM +0100, Paul Walmsley wrote:
> > 
> > Commit 621a0147d5c921f4cc33636ccd0602ad5d7cbfbc ("ARM: 7757/1: mm:
> > don't flush icache in switch_mm with hardware broadcasting") breaks
> > the boot on OMAP2430SDP with omap2plus_defconfig.  Tracked to an
> > undefined instruction abort from the CP15 read in
> > cache_ops_need_broadcast().  It turns out that early ARM1136 variants
> > don't support several CP15 registers that later ARM cores do.
> > ARM1136JF-S TRM section 3.2.1 "Register allocation" has the details.
> 
> Intriguing... I wouldn't expect a cp15 read to be emitted for 1136, since
> the SMP_ON_UP magic should have caused is_smp() to return false.
> 

...

> So we should have returned 0 here without exploding (this just reads a
> .globl initialised in head.S). Are we somehow misidentifying your 1136
> as an SMP core?

You're right - the is_arm1136_r0() logic is completely unnecessary - it's 
just the 'asm volatile' that makes the difference.

- Paul
Paul Walmsley July 28, 2013, 7:56 p.m. UTC | #4
On Sun, 28 Jul 2013, Russell King - ARM Linux wrote:

> We don't want to make the asm() itself volatile, because that means the 
> asm() statement can't be optimised away.
> 
> Adding a memory clobber to that asm seems to place it more appropriately,
> but again, that's not particularly desirable.
> 
> Another solution would be to make both tlb_ops_need_broadcast and
> cache_ops_need_broadcast both be a flag test, but that introduces a
> memory load in all those paths no matter if we're running on SMP_ON_UP
> or not.

I considered adding a HWCAP_* bit under the theory that elf_hwcap may 
already be cache-resident.  But thought that the loss of the flag space 
and possible memory access vs. the additional mrc wasn't worth it.

Anyway, if either of you have a preference between these three, I'd be 
happy to update the patch.  Or feel free to take it over from here if 
either of you prefer.


- Paul
diff mbox

Patch

diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
index 8c25dc4..91ccd77 100644
--- a/arch/arm/include/asm/cputype.h
+++ b/arch/arm/include/asm/cputype.h
@@ -76,6 +76,8 @@ 
 #define ARM_CPU_XSCALE_ARCH_V2		0x4000
 #define ARM_CPU_XSCALE_ARCH_V3		0x6000
 
+#define ARM_CPU_ARM1136R0		0x4107b360
+
 extern unsigned int processor_id;
 
 #ifdef CONFIG_CPU_CP15
@@ -89,10 +91,43 @@  extern unsigned int processor_id;
 		__val;							\
 	})
 
+
+/* Workaround for missing CP15 registers on ARM1136 r0 */
+# if defined(CONFIG_CPU_V6)
+/**
+ * cpu_is_arm1136_r0 - is the kernel running on an ARM1136 r0 core?
+ *
+ * Returns true if the kernel is running on an ARM1136 r0 core, or
+ * false otherwise.  Callers use this to avoid undefined instruction
+ * aborts from CP15 accesses to registers not present on the r0
+ * variant, or to detect whether certain CPU features are available.
+ * ARM1136JF-S TRM section 3.2.1 "Register allocation"
+ */
+static inline bool __attribute_const__ cpu_is_arm1136_r0(void)
+{
+	return ((read_cpuid(CPUID_ID) & 0xfffffff0) == ARM_CPU_ARM1136R0);
+}
+
+/*
+ * The mrc in the read_cpuid_ext macro must not be reordered on ARMv6,
+ * else the compiler may move it before any ARM1136r0 test.
+ */
+# define CPUID_EXT_REORDER	volatile
+# else
+static inline bool __attribute_const__ cpu_is_arm1136_r0(void) { return false; }
+# define CPUID_EXT_REORDER
+# endif
+
+/*
+ * Early ARM1136 variants don't support many CP15 registers provided
+ * on later cores.  Users of read_cpuid_ext must ensure that it won't
+ * be used unless it's known that the running core provides the CP15
+ * register ext_reg.  See cpu_is_arm1136_r0() above.
+ */
 #define read_cpuid_ext(ext_reg)						\
 	({								\
 		unsigned int __val;					\
-		asm("mrc	p15, 0, %0, c0, " ext_reg		\
+		asm CPUID_EXT_REORDER("mrc p15, 0, %0, c0, " ext_reg	\
 		    : "=r" (__val)					\
 		    :							\
 		    : "cc");						\
diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index 6462a72..76214cb 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -25,7 +25,6 @@  static inline bool is_smp(void)
 #endif
 }
 
-/* all SMP configurations have the extended CPUID registers */
 #ifndef CONFIG_MMU
 #define tlb_ops_need_broadcast()	0
 #else
@@ -43,6 +42,9 @@  static inline int tlb_ops_need_broadcast(void)
 #else
 static inline int cache_ops_need_broadcast(void)
 {
+	if (cpu_is_arm1136_r0())
+		return 0;
+
 	if (!is_smp())
 		return 0;