diff mbox

[3/4] ARM: mm: kill unused TLB_CAN_READ_FROM_L1_CACHE and use ALT_SMP instead

Message ID 1364235581-17900-4-git-send-email-will.deacon@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon March 25, 2013, 6:19 p.m. UTC
Many ARMv7 cores have hardware page table walkers that can read the L1
cache. This is discoverable from the ID_MMFR3 register, although this
can be expensive to access from the low-level set_pte functions and is a
pain to cache, particularly with multi-cluster systems.

A useful observation is that the multi-processing extensions for ARMv7
require coherent table walks, meaning that we can make use of ALT_SMP
patching in proc-v7-* to patch away the cache flush safely for these
cores.

Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/tlbflush.h | 2 +-
 arch/arm/mm/proc-v6.S           | 2 --
 arch/arm/mm/proc-v7-2level.S    | 3 ++-
 arch/arm/mm/proc-v7-3level.S    | 3 ++-
 arch/arm/mm/proc-v7.S           | 4 ++--
 5 files changed, 7 insertions(+), 7 deletions(-)

Comments

Catalin Marinas March 27, 2013, 10:53 a.m. UTC | #1
On Mon, Mar 25, 2013 at 06:19:40PM +0000, Will Deacon wrote:
> Many ARMv7 cores have hardware page table walkers that can read the L1
> cache. This is discoverable from the ID_MMFR3 register, although this
> can be expensive to access from the low-level set_pte functions and is a
> pain to cache, particularly with multi-cluster systems.
> 
> A useful observation is that the multi-processing extensions for ARMv7
> require coherent table walks, meaning that we can make use of ALT_SMP
> patching in proc-v7-* to patch away the cache flush safely for these
> cores.
> 
> Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

There are some pmd flushing functions we should target as well
(flush_pmd_entry, clean_pmd_entry) in this patch or a new one.
Will Deacon March 27, 2013, 12:20 p.m. UTC | #2
On Wed, Mar 27, 2013 at 10:53:49AM +0000, Catalin Marinas wrote:
> On Mon, Mar 25, 2013 at 06:19:40PM +0000, Will Deacon wrote:
> > Many ARMv7 cores have hardware page table walkers that can read the L1
> > cache. This is discoverable from the ID_MMFR3 register, although this
> > can be expensive to access from the low-level set_pte functions and is a
> > pain to cache, particularly with multi-cluster systems.
> > 
> > A useful observation is that the multi-processing extensions for ARMv7
> > require coherent table walks, meaning that we can make use of ALT_SMP
> > patching in proc-v7-* to patch away the cache flush safely for these
> > cores.
> > 
> > Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> There are some pmd flushing functions we should target as well
> (flush_pmd_entry, clean_pmd_entry) in this patch or a new one.

I already took care of those by avoiding the TLB_DCLEAN flag for v7 SMP.

Will
Gregory CLEMENT May 15, 2013, 1:18 p.m. UTC | #3
Hi Will,

On 03/25/2013 07:19 PM, Will Deacon wrote:
> Many ARMv7 cores have hardware page table walkers that can read the L1
> cache. This is discoverable from the ID_MMFR3 register, although this
> can be expensive to access from the low-level set_pte functions and is a
> pain to cache, particularly with multi-cluster systems.
> 
> A useful observation is that the multi-processing extensions for ARMv7
> require coherent table walks, meaning that we can make use of ALT_SMP
> patching in proc-v7-* to patch away the cache flush safely for these
> cores.

I encountered a regression with 3.10-rc1 on the Armada 370 based boards.
With the 3.10-rc1 they hang during auto testy of the xor engine which are
mainly DMA transfers. If I revert this patch, it no more hang. I found this
by using bisect, it was not obvious at all for me that this patch may have
cause this regression.
The issue appear in SMP and in UP. However I think that  the PJ4B-v7 used in
 the Armada 370 are not MP capable.

I made some investigation. And in UP if I remove the line:
	ALT_UP(W(nop))

at the begining of the cpu_v7_dcache_clean_are macro located in
arch/arm/mm/proc-v7.S

Then the kernel boot again. It is not surprising because in this case
we found the same generated code that before this patch was applied.

now I don't really understand why a W(nop) will cause this issue.

In SMP mode even with this line removed the kernel hang, but in this case
I am not sure of what happen exactly and how the .alt.smp.init section is
used.

I don't know if it is relevant but I tested with these 2 version of gcc:
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
and
gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-1ubuntu1)

I hope you will find some explanation and solution to this bug, because currently
the only solution I have is to revert this patch.

Thanks,
Gregory
> 
> Reported-by: Albin Tonnerre <Albin.Tonnerre@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/tlbflush.h | 2 +-
>  arch/arm/mm/proc-v6.S           | 2 --
>  arch/arm/mm/proc-v7-2level.S    | 3 ++-
>  arch/arm/mm/proc-v7-3level.S    | 3 ++-
>  arch/arm/mm/proc-v7.S           | 4 ++--
>  5 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> index c7cdb59..42d155e 100644
> --- a/arch/arm/include/asm/tlbflush.h
> +++ b/arch/arm/include/asm/tlbflush.h
> @@ -169,7 +169,7 @@
>  # define v6wbi_always_flags	(-1UL)
>  #endif
>  
> -#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
> +#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_BARRIER | \
>  				 TLB_V6_U_FULL | TLB_V6_U_PAGE | \
>  				 TLB_V6_U_ASID | TLB_V6_BP | \
>  				 TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
> diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S
> index bcaaa8d..a286d47 100644
> --- a/arch/arm/mm/proc-v6.S
> +++ b/arch/arm/mm/proc-v6.S
> @@ -80,12 +80,10 @@ ENTRY(cpu_v6_do_idle)
>  	mov	pc, lr
>  
>  ENTRY(cpu_v6_dcache_clean_area)
> -#ifndef TLB_CAN_READ_FROM_L1_CACHE
>  1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
>  	add	r0, r0, #D_CACHE_LINE_SIZE
>  	subs	r1, r1, #D_CACHE_LINE_SIZE
>  	bhi	1b
> -#endif
>  	mov	pc, lr
>  
>  /*
> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
> index 78f520b..9704097 100644
> --- a/arch/arm/mm/proc-v7-2level.S
> +++ b/arch/arm/mm/proc-v7-2level.S
> @@ -110,7 +110,8 @@ ENTRY(cpu_v7_set_pte_ext)
>   ARM(	str	r3, [r0, #2048]! )
>   THUMB(	add	r0, r0, #2048 )
>   THUMB(	str	r3, [r0] )
> -	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
> +	ALT_SMP(mov	pc,lr)
> +	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
>  #endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_set_pte_ext)
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 6ffd78c..363027e 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -73,7 +73,8 @@ ENTRY(cpu_v7_set_pte_ext)
>  	tst	r3, #1 << (55 - 32)		@ L_PTE_DIRTY
>  	orreq	r2, #L_PTE_RDONLY
>  1:	strd	r2, r3, [r0]
> -	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
> +	ALT_SMP(mov	pc, lr)
> +	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
>  #endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_set_pte_ext)
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 3a3c015..37716b0 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -75,14 +75,14 @@ ENTRY(cpu_v7_do_idle)
>  ENDPROC(cpu_v7_do_idle)
>  
>  ENTRY(cpu_v7_dcache_clean_area)
> -#ifndef TLB_CAN_READ_FROM_L1_CACHE
> +	ALT_SMP(mov	pc, lr)			@ MP extensions imply L1 PTW
> +	ALT_UP(W(nop))
>  	dcache_line_size r2, r3
>  1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
>  	add	r0, r0, r2
>  	subs	r1, r1, r2
>  	bhi	1b
>  	dsb
> -#endif
>  	mov	pc, lr
>  ENDPROC(cpu_v7_dcache_clean_area)
>  
>
Will Deacon May 15, 2013, 1:41 p.m. UTC | #4
On Wed, May 15, 2013 at 02:18:53PM +0100, Gregory CLEMENT wrote:
> Hi Will,

Hi Gregory,

> On 03/25/2013 07:19 PM, Will Deacon wrote:
> > Many ARMv7 cores have hardware page table walkers that can read the L1
> > cache. This is discoverable from the ID_MMFR3 register, although this
> > can be expensive to access from the low-level set_pte functions and is a
> > pain to cache, particularly with multi-cluster systems.
> > 
> > A useful observation is that the multi-processing extensions for ARMv7
> > require coherent table walks, meaning that we can make use of ALT_SMP
> > patching in proc-v7-* to patch away the cache flush safely for these
> > cores.
> 
> I encountered a regression with 3.10-rc1 on the Armada 370 based boards.
> With the 3.10-rc1 they hang during auto testy of the xor engine which are
> mainly DMA transfers. If I revert this patch, it no more hang. I found this
> by using bisect, it was not obvious at all for me that this patch may have
> cause this regression.

Is this using dmatest.ko, or a different test program?

> The issue appear in SMP and in UP. However I think that  the PJ4B-v7 used in
>  the Armada 370 are not MP capable.

Ok, so the ALT_UP case should be emitted after patching, correct?

> I made some investigation. And in UP if I remove the line:
> 	ALT_UP(W(nop))

Did you remove the ALT_SMP case as well? You could also try making the
ALT_SMP case use W(nop) too and see if it changes anything.

> at the begining of the cpu_v7_dcache_clean_are macro located in
> arch/arm/mm/proc-v7.S
> 
> Then the kernel boot again. It is not surprising because in this case
> we found the same generated code that before this patch was applied.
> 
> now I don't really understand why a W(nop) will cause this issue.

No, that sounds weird. Can you inspect the functions using JTAG after the smp
patching code has executed?

> In SMP mode even with this line removed the kernel hang, but in this case
> I am not sure of what happen exactly and how the .alt.smp.init section is
> used.

If you don't have both of the alternatives, things will go wrong.

> I hope you will find some explanation and solution to this bug, because currently
> the only solution I have is to revert this patch.

Let's not jump to that just yet!

Cheers,

Will
Gregory CLEMENT May 15, 2013, 1:54 p.m. UTC | #5
On 05/15/2013 03:41 PM, Will Deacon wrote:
> On Wed, May 15, 2013 at 02:18:53PM +0100, Gregory CLEMENT wrote:
>> Hi Will,
> 
> Hi Gregory,
> 
>> On 03/25/2013 07:19 PM, Will Deacon wrote:
>>> Many ARMv7 cores have hardware page table walkers that can read the L1
>>> cache. This is discoverable from the ID_MMFR3 register, although this
>>> can be expensive to access from the low-level set_pte functions and is a
>>> pain to cache, particularly with multi-cluster systems.
>>>
>>> A useful observation is that the multi-processing extensions for ARMv7
>>> require coherent table walks, meaning that we can make use of ALT_SMP
>>> patching in proc-v7-* to patch away the cache flush safely for these
>>> cores.
>>
>> I encountered a regression with 3.10-rc1 on the Armada 370 based boards.
>> With the 3.10-rc1 they hang during auto testy of the xor engine which are
>> mainly DMA transfers. If I revert this patch, it no more hang. I found this
>> by using bisect, it was not obvious at all for me that this patch may have
>> cause this regression.
> 
> Is this using dmatest.ko, or a different test program?

No they are self-test from the mv_xor driver

> 
>> The issue appear in SMP and in UP. However I think that  the PJ4B-v7 used in
>>  the Armada 370 are not MP capable.
> 
> Ok, so the ALT_UP case should be emitted after patching, correct?

Indeed it was I excepted but I didn't check (I don't know how to do)

> 
>> I made some investigation. And in UP if I remove the line:
>> 	ALT_UP(W(nop))
> 
> Did you remove the ALT_SMP case as well? You could also try making the
> ALT_SMP case use W(nop) too and see if it changes anything.

OK I will try it.

> 
>> at the begining of the cpu_v7_dcache_clean_are macro located in
>> arch/arm/mm/proc-v7.S
>>
>> Then the kernel boot again. It is not surprising because in this case
>> we found the same generated code that before this patch was applied.
>>
>> now I don't really understand why a W(nop) will cause this issue.
> 
> No, that sounds weird. Can you inspect the functions using JTAG after the smp
> patching code has executed?

No I can't: I don't have any JTAG :/

> 
>> In SMP mode even with this line removed the kernel hang, but in this case
>> I am not sure of what happen exactly and how the .alt.smp.init section is
>> used.
> 
> If you don't have both of the alternatives, things will go wrong.

For my own culture, how ALT_UP and ALT_SMP work in SMP case?
When I disassembled the proc-v7.o, I saw that the SMP variant of the code were
written. How the kernel switch to the UP version of the code?

> 
>> I hope you will find some explanation and solution to this bug, because currently
>> the only solution I have is to revert this patch.
> 
> Let's not jump to that just yet!

Sure I hope we will find a fix for that.
Thanks,

Gregory
Will Deacon May 15, 2013, 2:06 p.m. UTC | #6
On Wed, May 15, 2013 at 02:54:14PM +0100, Gregory CLEMENT wrote:
> On 05/15/2013 03:41 PM, Will Deacon wrote:
> > Is this using dmatest.ko, or a different test program?
> 
> No they are self-test from the mv_xor driver

Ok. And the CPU locks up, rather than the DMA master?

> > Ok, so the ALT_UP case should be emitted after patching, correct?
> 
> Indeed it was I excepted but I didn't check (I don't know how to do)

If you don't have JTAG, that's trickier. Hopefully my other suggestion will
help.

> >> I made some investigation. And in UP if I remove the line:
> >> 	ALT_UP(W(nop))
> > 
> > Did you remove the ALT_SMP case as well? You could also try making the
> > ALT_SMP case use W(nop) too and see if it changes anything.
> 
> OK I will try it.

You could also try deleting both of the ALT_* lines and just putting a
W(nop) in there directly.

> > If you don't have both of the alternatives, things will go wrong.
> 
> For my own culture, how ALT_UP and ALT_SMP work in SMP case?
> When I disassembled the proc-v7.o, I saw that the SMP variant of the code were
> written. How the kernel switch to the UP version of the code?

Early in boot (head.S:__fixup_smp), we detect whether the CPU has the MP
extensions then, if we're actually UP but the kernel has CONFIG_SMP=y, we
walk over the .alt.smp.init section looking at each entry in there. The
ALT_UP macro spits out an (address, instruction) pair, so in
__do_fixup_smp_on_up, we store the instruction to the address for each pair,
replacing the SMP instruction which sat there in the compiled image.

It could be that the CPUID checks are failing on your Marvell part. Can you
tell me what you have in your mpidr (mrc p15, 0, Rd, c0, c0, 5) and also
your midr (mrc p15, 0, Rd, c0, c0) please?

Cheers,

Will
Gregory CLEMENT May 15, 2013, 2:46 p.m. UTC | #7
On 05/15/2013 04:06 PM, Will Deacon wrote:
> On Wed, May 15, 2013 at 02:54:14PM +0100, Gregory CLEMENT wrote:
>> On 05/15/2013 03:41 PM, Will Deacon wrote:
>>> Is this using dmatest.ko, or a different test program?
>>
>> No they are self-test from the mv_xor driver
> 
> Ok. And the CPU locks up, rather than the DMA master?
> 
>>> Ok, so the ALT_UP case should be emitted after patching, correct?
>>
>> Indeed it was I excepted but I didn't check (I don't know how to do)
> 
> If you don't have JTAG, that's trickier. Hopefully my other suggestion will
> help.
> 
>>>> I made some investigation. And in UP if I remove the line:
>>>> 	ALT_UP(W(nop))
>>>
>>> Did you remove the ALT_SMP case as well? You could also try making the
>>> ALT_SMP case use W(nop) too and see if it changes anything.
>>
>> OK I will try it.
> 
> You could also try deleting both of the ALT_* lines and just putting a
> W(nop) in there directly.

If I just delete the both of the ALT_* lines it no more hangs.
If I put a  W(nop) instead it hangs.

> 
>>> If you don't have both of the alternatives, things will go wrong.
>>
>> For my own culture, how ALT_UP and ALT_SMP work in SMP case?
>> When I disassembled the proc-v7.o, I saw that the SMP variant of the code were
>> written. How the kernel switch to the UP version of the code?
> 
> Early in boot (head.S:__fixup_smp), we detect whether the CPU has the MP
> extensions then, if we're actually UP but the kernel has CONFIG_SMP=y, we
> walk over the .alt.smp.init section looking at each entry in there. The
> ALT_UP macro spits out an (address, instruction) pair, so in
> __do_fixup_smp_on_up, we store the instruction to the address for each pair,
> replacing the SMP instruction which sat there in the compiled image.
> 
> It could be that the CPUID checks are failing on your Marvell part. Can you
> tell me what you have in your mpidr (mrc p15, 0, Rd, c0, c0, 5) and also
> your midr (mrc p15, 0, Rd, c0, c0) please?

mpidr= 0x0
midr= 0x561F5811

> 
> Cheers,
> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon May 15, 2013, 3:04 p.m. UTC | #8
On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
> On 05/15/2013 04:06 PM, Will Deacon wrote:
> > You could also try deleting both of the ALT_* lines and just putting a
> > W(nop) in there directly.
> 
> If I just delete the both of the ALT_* lines it no more hangs.
> If I put a  W(nop) instead it hangs.

Wow. This doesn't sound good for your CPU and you might want to check with
the Marvell guys...

Extra things you could try:

	- Try adding a 16-bit nop instead (remove the W, build thumb-2 and
	  double check in your diasassembly)

	- Try adding the W(nop) to other places in the kernel and see if you
	  can tickle the lock-up elsewhere.

	- Can you reproduce on the Armada XP? (since I have access to one of
	  those)

> > Early in boot (head.S:__fixup_smp), we detect whether the CPU has the MP
> > extensions then, if we're actually UP but the kernel has CONFIG_SMP=y, we
> > walk over the .alt.smp.init section looking at each entry in there. The
> > ALT_UP macro spits out an (address, instruction) pair, so in
> > __do_fixup_smp_on_up, we store the instruction to the address for each pair,
> > replacing the SMP instruction which sat there in the compiled image.
> > 
> > It could be that the CPUID checks are failing on your Marvell part. Can you
> > tell me what you have in your mpidr (mrc p15, 0, Rd, c0, c0, 5) and also
> > your midr (mrc p15, 0, Rd, c0, c0) please?
> 
> mpidr= 0x0
> midr= 0x561F5811

Should be fine, but it doesn't look like the patching is the issue here.

Will
Gregory CLEMENT May 15, 2013, 3:36 p.m. UTC | #9
On 05/15/2013 05:04 PM, Will Deacon wrote:
> On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
>> On 05/15/2013 04:06 PM, Will Deacon wrote:
>>> You could also try deleting both of the ALT_* lines and just putting a
>>> W(nop) in there directly.
>>
>> If I just delete the both of the ALT_* lines it no more hangs.
>> If I put a  W(nop) instead it hangs.

It also hang with a simple nop by the way

> 
> Wow. This doesn't sound good for your CPU and you might want to check with
> the Marvell guys...
> 
> Extra things you could try:
> 
> 	- Try adding a 16-bit nop instead (remove the W, build thumb-2 and
> 	  double check in your diasassembly)
> 
> 	- Try adding the W(nop) to other places in the kernel and see if you
> 	  can tickle the lock-up elsewhere.

I managed to add  W(nop) elsewhere in the kernel without getting any lock-up.

Is the fact that this nop is the first instruction of the macro could have an
influence ?


> 
> 	- Can you reproduce on the Armada XP? (since I have access to one of
> 	  those)

No on Armada XP I don't have this kind of problem even on UP.

> 
>>> Early in boot (head.S:__fixup_smp), we detect whether the CPU has the MP
>>> extensions then, if we're actually UP but the kernel has CONFIG_SMP=y, we
>>> walk over the .alt.smp.init section looking at each entry in there. The
>>> ALT_UP macro spits out an (address, instruction) pair, so in
>>> __do_fixup_smp_on_up, we store the instruction to the address for each pair,
>>> replacing the SMP instruction which sat there in the compiled image.
>>>
>>> It could be that the CPUID checks are failing on your Marvell part. Can you
>>> tell me what you have in your mpidr (mrc p15, 0, Rd, c0, c0, 5) and also
>>> your midr (mrc p15, 0, Rd, c0, c0) please?
>>
>> mpidr= 0x0
>> midr= 0x561F5811
> 
> Should be fine, but it doesn't look like the patching is the issue here.
> 
> Will
>
Will Deacon May 15, 2013, 3:41 p.m. UTC | #10
On Wed, May 15, 2013 at 04:36:43PM +0100, Gregory CLEMENT wrote:
> On 05/15/2013 05:04 PM, Will Deacon wrote:
> > On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
> >> On 05/15/2013 04:06 PM, Will Deacon wrote:
> >>> You could also try deleting both of the ALT_* lines and just putting a
> >>> W(nop) in there directly.
> >>
> >> If I just delete the both of the ALT_* lines it no more hangs.
> >> If I put a  W(nop) instead it hangs.
> 
> It also hang with a simple nop by the way

Ok. Have you tried adding a different instruction (mov r0, r0, for example)?

> > 
> > Wow. This doesn't sound good for your CPU and you might want to check with
> > the Marvell guys...
> > 
> > Extra things you could try:
> > 
> > 	- Try adding a 16-bit nop instead (remove the W, build thumb-2 and
> > 	  double check in your diasassembly)
> > 
> > 	- Try adding the W(nop) to other places in the kernel and see if you
> > 	  can tickle the lock-up elsewhere.
> 
> I managed to add  W(nop) elsewhere in the kernel without getting any lock-up.
> 
> Is the fact that this nop is the first instruction of the macro could have an
> influence ?

I can't think why -- the macro is just expanded inline during assembly.

> 
> > 
> > 	- Can you reproduce on the Armada XP? (since I have access to one of
> > 	  those)
> 
> No on Armada XP I don't have this kind of problem even on UP.

Is that compiling with CONFIG_SMP=n?

Will
Gregory CLEMENT May 15, 2013, 4:29 p.m. UTC | #11
On 05/15/2013 05:41 PM, Will Deacon wrote:
> On Wed, May 15, 2013 at 04:36:43PM +0100, Gregory CLEMENT wrote:
>> On 05/15/2013 05:04 PM, Will Deacon wrote:
>>> On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
>>>> On 05/15/2013 04:06 PM, Will Deacon wrote:
>>>>> You could also try deleting both of the ALT_* lines and just putting a
>>>>> W(nop) in there directly.
>>>>
>>>> If I just delete the both of the ALT_* lines it no more hangs.
>>>> If I put a  W(nop) instead it hangs.
>>
>> It also hang with a simple nop by the way
> 
> Ok. Have you tried adding a different instruction (mov r0, r0, for example)?

it hanged again :(

I also try to boot a kernel in Thumb2, it doesn't hang in the same place but the kernel
crash latter

> 
>>>
>>> Wow. This doesn't sound good for your CPU and you might want to check with
>>> the Marvell guys...
>>>
>>> Extra things you could try:
>>>
>>> 	- Try adding a 16-bit nop instead (remove the W, build thumb-2 and
>>> 	  double check in your diasassembly)
>>>
>>> 	- Try adding the W(nop) to other places in the kernel and see if you
>>> 	  can tickle the lock-up elsewhere.
>>
>> I managed to add  W(nop) elsewhere in the kernel without getting any lock-up.
>>
>> Is the fact that this nop is the first instruction of the macro could have an
>> influence ?
> 
> I can't think why -- the macro is just expanded inline during assembly.

If I put anything before the "dcache_line_size r2, r3" line the kernel hang, but
just after it's ok.
> 
>>
>>>
>>> 	- Can you reproduce on the Armada XP? (since I have access to one of
>>> 	  those)
>>
>> No on Armada XP I don't have this kind of problem even on UP.
> 
> Is that compiling with CONFIG_SMP=n?
Yes

> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon May 15, 2013, 4:48 p.m. UTC | #12
On Wed, May 15, 2013 at 05:29:03PM +0100, Gregory CLEMENT wrote:
> On 05/15/2013 05:41 PM, Will Deacon wrote:
> > On Wed, May 15, 2013 at 04:36:43PM +0100, Gregory CLEMENT wrote:
> >> On 05/15/2013 05:04 PM, Will Deacon wrote:
> >>> On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
> >>>> On 05/15/2013 04:06 PM, Will Deacon wrote:
> >>>>> You could also try deleting both of the ALT_* lines and just putting a
> >>>>> W(nop) in there directly.
> >>>>
> >>>> If I just delete the both of the ALT_* lines it no more hangs.
> >>>> If I put a  W(nop) instead it hangs.
> >>
> >> It also hang with a simple nop by the way
> > 
> > Ok. Have you tried adding a different instruction (mov r0, r0, for example)?
> 
> it hanged again :(
> 
> I also try to boot a kernel in Thumb2, it doesn't hang in the same place but the kernel
> crash latter

Crikey, that's not much fun. My last idea is to try sticking an isb
instruction in there. Does that make any difference?

Can you get access to another 370 board and see if the problem reproduces
there? Is there any possibility of getting JTAG (again, perhaps on a
different board)?

Failing that, I think you need to pester the CPU guys and/or the board guys
to help you debug this.

Will
Russell King - ARM Linux May 15, 2013, 5:16 p.m. UTC | #13
On Wed, May 15, 2013 at 05:48:43PM +0100, Will Deacon wrote:
> On Wed, May 15, 2013 at 05:29:03PM +0100, Gregory CLEMENT wrote:
> > On 05/15/2013 05:41 PM, Will Deacon wrote:
> > > On Wed, May 15, 2013 at 04:36:43PM +0100, Gregory CLEMENT wrote:
> > >> On 05/15/2013 05:04 PM, Will Deacon wrote:
> > >>> On Wed, May 15, 2013 at 03:46:20PM +0100, Gregory CLEMENT wrote:
> > >>>> On 05/15/2013 04:06 PM, Will Deacon wrote:
> > >>>>> You could also try deleting both of the ALT_* lines and just putting a
> > >>>>> W(nop) in there directly.
> > >>>>
> > >>>> If I just delete the both of the ALT_* lines it no more hangs.
> > >>>> If I put a  W(nop) instead it hangs.
> > >>
> > >> It also hang with a simple nop by the way
> > > 
> > > Ok. Have you tried adding a different instruction (mov r0, r0, for example)?
> > 
> > it hanged again :(
> > 
> > I also try to boot a kernel in Thumb2, it doesn't hang in the same place but the kernel
> > crash latter
> 
> Crikey, that's not much fun. My last idea is to try sticking an isb
> instruction in there. Does that make any difference?
> 
> Can you get access to another 370 board and see if the problem reproduces
> there? Is there any possibility of getting JTAG (again, perhaps on a
> different board)?
> 
> Failing that, I think you need to pester the CPU guys and/or the board guys
> to help you debug this.

I've just tried it on the cubox (Dove, Armada 510) which has the xor
engines enabled in the config.  I see in the boot messages:

mv_xor mv_xor.0: Marvell XOR driver
mv_xor mv_xor.0: Marvell XOR: ( xor cpy )
mv_xor mv_xor.0: Marvell XOR: ( xor fill cpy )
mv_xor mv_xor.1: Marvell XOR driver
mv_xor mv_xor.1: Marvell XOR: ( xor cpy )
mv_xor mv_xor.1: Marvell XOR: ( xor fill cpy )

and /proc/interrupts shows that the interrupts have fired:

 39:          2  orion_irq  mv_xor.0
 40:          2  orion_irq  mv_xor.0
 42:          2  orion_irq  mv_xor.1
 43:          2  orion_irq  mv_xor.1

on each boot.  So I assume that they're doing this self-test.

Anyway, I've added the 'nop' to cpu_v7_dcache_clean_area, and it boots
just fine.  The CPU is:

CPU: ARMv7 Processor [560f5815] revision 5 (ARMv7), cr=10c53c7d
diff mbox

Patch

diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
index c7cdb59..42d155e 100644
--- a/arch/arm/include/asm/tlbflush.h
+++ b/arch/arm/include/asm/tlbflush.h
@@ -169,7 +169,7 @@ 
 # define v6wbi_always_flags	(-1UL)
 #endif
 
-#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_DCLEAN | TLB_BARRIER | \
+#define v7wbi_tlb_flags_smp	(TLB_WB | TLB_BARRIER | \
 				 TLB_V6_U_FULL | TLB_V6_U_PAGE | \
 				 TLB_V6_U_ASID | TLB_V6_BP | \
 				 TLB_V7_UIS_FULL | TLB_V7_UIS_PAGE | \
diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S
index bcaaa8d..a286d47 100644
--- a/arch/arm/mm/proc-v6.S
+++ b/arch/arm/mm/proc-v6.S
@@ -80,12 +80,10 @@  ENTRY(cpu_v6_do_idle)
 	mov	pc, lr
 
 ENTRY(cpu_v6_dcache_clean_area)
-#ifndef TLB_CAN_READ_FROM_L1_CACHE
 1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
 	add	r0, r0, #D_CACHE_LINE_SIZE
 	subs	r1, r1, #D_CACHE_LINE_SIZE
 	bhi	1b
-#endif
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 78f520b..9704097 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -110,7 +110,8 @@  ENTRY(cpu_v7_set_pte_ext)
  ARM(	str	r3, [r0, #2048]! )
  THUMB(	add	r0, r0, #2048 )
  THUMB(	str	r3, [r0] )
-	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
+	ALT_SMP(mov	pc,lr)
+	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
 #endif
 	mov	pc, lr
 ENDPROC(cpu_v7_set_pte_ext)
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 6ffd78c..363027e 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -73,7 +73,8 @@  ENTRY(cpu_v7_set_pte_ext)
 	tst	r3, #1 << (55 - 32)		@ L_PTE_DIRTY
 	orreq	r2, #L_PTE_RDONLY
 1:	strd	r2, r3, [r0]
-	mcr	p15, 0, r0, c7, c10, 1		@ flush_pte
+	ALT_SMP(mov	pc, lr)
+	ALT_UP (mcr	p15, 0, r0, c7, c10, 1)		@ flush_pte
 #endif
 	mov	pc, lr
 ENDPROC(cpu_v7_set_pte_ext)
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 3a3c015..37716b0 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -75,14 +75,14 @@  ENTRY(cpu_v7_do_idle)
 ENDPROC(cpu_v7_do_idle)
 
 ENTRY(cpu_v7_dcache_clean_area)
-#ifndef TLB_CAN_READ_FROM_L1_CACHE
+	ALT_SMP(mov	pc, lr)			@ MP extensions imply L1 PTW
+	ALT_UP(W(nop))
 	dcache_line_size r2, r3
 1:	mcr	p15, 0, r0, c7, c10, 1		@ clean D entry
 	add	r0, r0, r2
 	subs	r1, r1, r2
 	bhi	1b
 	dsb
-#endif
 	mov	pc, lr
 ENDPROC(cpu_v7_dcache_clean_area)