diff mbox

ARM: socfpga: put back v7_invalidate_l1 in socfpga_secondary_startup

Message ID 20150708210734.GN7557@n2100.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King - ARM Linux July 8, 2015, 9:07 p.m. UTC
On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
> The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
> user mode access.

Hmm.

I think what you've found is a(nother) latent bug in the CPU bring up
code.

For SMP CPUs, the sequence we're following during early initialisation is:

1. Enable SMP coherency.
2. Invalidate the caches.

If the cache contains rubbish, enabling SMP coherency before invalidating
the cache is plainly an absurd thing to do.

Can you try the patch below - not tested in any way, so you may need to
tweak it, but it should allow us to prove that point.

Comments

dinguyen@opensource.altera.com July 8, 2015, 9:55 p.m. UTC | #1
On 07/08/2015 04:07 PM, Russell King - ARM Linux wrote:
> On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
>> The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
>> user mode access.
> 
> Hmm.
> 
> I think what you've found is a(nother) latent bug in the CPU bring up
> code.
> 
> For SMP CPUs, the sequence we're following during early initialisation is:
> 
> 1. Enable SMP coherency.
> 2. Invalidate the caches.
> 
> If the cache contains rubbish, enabling SMP coherency before invalidating
> the cache is plainly an absurd thing to do.
> 
> Can you try the patch below - not tested in any way, so you may need to
> tweak it, but it should allow us to prove that point.
> 
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 0716bbe19872..db5137fc297d 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -275,6 +275,10 @@ __v7_b15mp_setup:
>  __v7_ca17mp_setup:
>  	mov	r10, #0
>  1:
> +	adr	r12, __v7_setup_stack		@ the local stack
> +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> +	bl      v7_invalidate_l1
> +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
>  #ifdef CONFIG_SMP
>  	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
>  	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
> @@ -283,7 +287,7 @@ __v7_ca17mp_setup:
>  	orreq	r0, r0, r10			@ Enable CPU-specific SMP bits
>  	mcreq	p15, 0, r0, c1, c0, 1
>  #endif
> -	b	__v7_setup
> +	b	__v7_setup_cont
>  
>  /*
>   * Errata:
> @@ -417,6 +421,7 @@ __v7_setup:
>  	bl      v7_invalidate_l1
>  	ldmia	r12, {r0-r5, r7, r9, r11, lr}
>  
> +__v7_setup_cont:
>  	and	r0, r9, #0xff000000		@ ARM?
>  	teq	r0, #0x41000000
>  	bne	__errata_finish
> @@ -480,7 +485,7 @@ ENDPROC(__v7_setup)
>  
>  	.align	2
>  __v7_setup_stack:
> -	.space	4 * 11				@ 11 registers
> +	.space	4 * 12				@ 12 registers
>  
>  	__INITDATA
>  
> 


This patch seems to have fixed the issue. The SoCFPGA platform is now
booting/rebooting reliably. Also, the patch was applicable as-is.

Also, I went back and studied up on the CPACR register. I misquoted the
value in my previous response, I was looking at CPACR on CPU0. For CPU1,
when the error happens, the value of CPACR is 0x0, so CP10 and CP11 were
set to "Access denied".

Dinh
Jisheng Zhang July 9, 2015, 3:52 a.m. UTC | #2
Dear Russell,

On Wed, 8 Jul 2015 22:07:34 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
> > The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
> > user mode access.
> 
> Hmm.
> 
> I think what you've found is a(nother) latent bug in the CPU bring up
> code.
> 
> For SMP CPUs, the sequence we're following during early initialisation is:
> 
> 1. Enable SMP coherency.
> 2. Invalidate the caches.
> 
> If the cache contains rubbish, enabling SMP coherency before invalidating
> the cache is plainly an absurd thing to do.
> 
> Can you try the patch below - not tested in any way, so you may need to
> tweak it, but it should allow us to prove that point.
> 
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 0716bbe19872..db5137fc297d 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -275,6 +275,10 @@ __v7_b15mp_setup:
>  __v7_ca17mp_setup:
>  	mov	r10, #0
>  1:
> +	adr	r12, __v7_setup_stack		@ the local stack
> +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> +	bl      v7_invalidate_l1
> +	ldmia	r12, {r0-r5, r7, r9-r11, lr}

Some CPUs such as CA7 need enable SMP before any cache maintenance.

CA7 TRM says something about SMP bit:
"You must ensure this bit is set to 1 before the caches and MMU are enabled,
or any cache and TLB maintenance operations are performed."

So it seems we need to use different path for different CPUs.

Also CA7 would invalidate L1 automatically once reset, can we remove the
invalidate op in CA7 case?

I'm not sure I understand the code correctly, criticism is welcome.

Thanks,
Jisheng

>  #ifdef CONFIG_SMP
>  	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
>  	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
> @@ -283,7 +287,7 @@ __v7_ca17mp_setup:
>  	orreq	r0, r0, r10			@ Enable CPU-specific SMP bits
>  	mcreq	p15, 0, r0, c1, c0, 1
>  #endif
> -	b	__v7_setup
> +	b	__v7_setup_cont
>  
>  /*
>   * Errata:
> @@ -417,6 +421,7 @@ __v7_setup:
>  	bl      v7_invalidate_l1
>  	ldmia	r12, {r0-r5, r7, r9, r11, lr}
>  
> +__v7_setup_cont:
>  	and	r0, r9, #0xff000000		@ ARM?
>  	teq	r0, #0x41000000
>  	bne	__errata_finish
> @@ -480,7 +485,7 @@ ENDPROC(__v7_setup)
>  
>  	.align	2
>  __v7_setup_stack:
> -	.space	4 * 11				@ 11 registers
> +	.space	4 * 12				@ 12 registers
>  
>  	__INITDATA
>
Russell King - ARM Linux July 9, 2015, 7:57 a.m. UTC | #3
On Thu, Jul 09, 2015 at 11:52:49AM +0800, Jisheng Zhang wrote:
> Dear Russell,
> 
> On Wed, 8 Jul 2015 22:07:34 +0100
> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
> > On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
> > > The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
> > > user mode access.
> > 
> > Hmm.
> > 
> > I think what you've found is a(nother) latent bug in the CPU bring up
> > code.
> > 
> > For SMP CPUs, the sequence we're following during early initialisation is:
> > 
> > 1. Enable SMP coherency.
> > 2. Invalidate the caches.
> > 
> > If the cache contains rubbish, enabling SMP coherency before invalidating
> > the cache is plainly an absurd thing to do.
> > 
> > Can you try the patch below - not tested in any way, so you may need to
> > tweak it, but it should allow us to prove that point.
> > 
> > diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> > index 0716bbe19872..db5137fc297d 100644
> > --- a/arch/arm/mm/proc-v7.S
> > +++ b/arch/arm/mm/proc-v7.S
> > @@ -275,6 +275,10 @@ __v7_b15mp_setup:
> >  __v7_ca17mp_setup:
> >  	mov	r10, #0
> >  1:
> > +	adr	r12, __v7_setup_stack		@ the local stack
> > +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> > +	bl      v7_invalidate_l1
> > +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
> 
> Some CPUs such as CA7 need enable SMP before any cache maintenance.
> 
> CA7 TRM says something about SMP bit:
> "You must ensure this bit is set to 1 before the caches and MMU are enabled,
> or any cache and TLB maintenance operations are performed."

Frankly, that's wrong for two reasons.  Think about it for a moment...

If the cache contains crap - in other words, it contains random
uninitialised data in the cache lines at random locations, some of
which are marked valid and some of which are marked dirty - then
enabling the SMP bit puts the caches into coherent mode, and they
join the coherent cluster.

That means those cache lines containing crap become visible to other
CPUs in the cluster, and can be migrated to other CPUs, and the crap
data in them becomes visible to other CPUs.  This leads to state
corruption on other CPUs in the cluster.

Moreover, the cache invalidation of the local L1 cache is broadcast
to other CPUs in the cluster, and _their_ caches are also invalidated,
again, leading to state corruption on already running CPUs.  We don't
want the invalidation of the incoming CPU to be broadcast to the other
CPUs.

This is all round a very bad thing.

> Also CA7 would invalidate L1 automatically once reset, can we remove the
> invalidate op in CA7 case?

No, because we enter this path from multiple different situations, eg,
after the decompressor has run, after the boot loader has run which
may have enabled caches and not properly invalidated them prior to
calling the kernel.
Jisheng Zhang July 9, 2015, 8:17 a.m. UTC | #4
Dear Russell,

On Thu, 9 Jul 2015 08:57:17 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Thu, Jul 09, 2015 at 11:52:49AM +0800, Jisheng Zhang wrote:
> > Dear Russell,
> > 
> > On Wed, 8 Jul 2015 22:07:34 +0100
> > Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> > 
> > > On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
> > > > The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
> > > > user mode access.
> > > 
> > > Hmm.
> > > 
> > > I think what you've found is a(nother) latent bug in the CPU bring up
> > > code.
> > > 
> > > For SMP CPUs, the sequence we're following during early initialisation is:
> > > 
> > > 1. Enable SMP coherency.
> > > 2. Invalidate the caches.
> > > 
> > > If the cache contains rubbish, enabling SMP coherency before invalidating
> > > the cache is plainly an absurd thing to do.
> > > 
> > > Can you try the patch below - not tested in any way, so you may need to
> > > tweak it, but it should allow us to prove that point.
> > > 
> > > diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> > > index 0716bbe19872..db5137fc297d 100644
> > > --- a/arch/arm/mm/proc-v7.S
> > > +++ b/arch/arm/mm/proc-v7.S
> > > @@ -275,6 +275,10 @@ __v7_b15mp_setup:
> > >  __v7_ca17mp_setup:
> > >  	mov	r10, #0
> > >  1:
> > > +	adr	r12, __v7_setup_stack		@ the local stack
> > > +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> > > +	bl      v7_invalidate_l1
> > > +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
> > 
> > Some CPUs such as CA7 need enable SMP before any cache maintenance.
> > 
> > CA7 TRM says something about SMP bit:
> > "You must ensure this bit is set to 1 before the caches and MMU are enabled,
> > or any cache and TLB maintenance operations are performed."
> 
> Frankly, that's wrong for two reasons.  Think about it for a moment...
> 
> If the cache contains crap - in other words, it contains random
> uninitialised data in the cache lines at random locations, some of
> which are marked valid and some of which are marked dirty - then
> enabling the SMP bit puts the caches into coherent mode, and they
> join the coherent cluster.
> 
> That means those cache lines containing crap become visible to other
> CPUs in the cluster, and can be migrated to other CPUs, and the crap
> data in them becomes visible to other CPUs.  This leads to state
> corruption on other CPUs in the cluster.
> 
> Moreover, the cache invalidation of the local L1 cache is broadcast
> to other CPUs in the cluster, and _their_ caches are also invalidated,
> again, leading to state corruption on already running CPUs.  We don't
> want the invalidation of the incoming CPU to be broadcast to the other
> CPUs.
> 
> This is all round a very bad thing.
> 
> > Also CA7 would invalidate L1 automatically once reset, can we remove the
> > invalidate op in CA7 case?
> 
> No, because we enter this path from multiple different situations, eg,
> after the decompressor has run, after the boot loader has run which
> may have enabled caches and not properly invalidated them prior to
> calling the kernel.
> 

Got it. Thanks very much for your detailed explanation!
dinguyen@opensource.altera.com July 14, 2015, 12:15 p.m. UTC | #5
Hi Russell,

On 7/9/15 3:17 AM, Jisheng Zhang wrote:
> Dear Russell,
> 
> On Thu, 9 Jul 2015 08:57:17 +0100
> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> 
>> On Thu, Jul 09, 2015 at 11:52:49AM +0800, Jisheng Zhang wrote:
>>> Dear Russell,
>>>
>>> On Wed, 8 Jul 2015 22:07:34 +0100
>>> Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
>>>
>>>> On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
>>>>> The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
>>>>> user mode access.
>>>>
>>>> Hmm.
>>>>
>>>> I think what you've found is a(nother) latent bug in the CPU bring up
>>>> code.
>>>>
>>>> For SMP CPUs, the sequence we're following during early initialisation is:
>>>>
>>>> 1. Enable SMP coherency.
>>>> 2. Invalidate the caches.
>>>>
>>>> If the cache contains rubbish, enabling SMP coherency before invalidating
>>>> the cache is plainly an absurd thing to do.
>>>>
>>>> Can you try the patch below - not tested in any way, so you may need to
>>>> tweak it, but it should allow us to prove that point.
>>>>
>>>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>>>> index 0716bbe19872..db5137fc297d 100644
>>>> --- a/arch/arm/mm/proc-v7.S
>>>> +++ b/arch/arm/mm/proc-v7.S
>>>> @@ -275,6 +275,10 @@ __v7_b15mp_setup:
>>>>  __v7_ca17mp_setup:
>>>>  	mov	r10, #0
>>>>  1:
>>>> +	adr	r12, __v7_setup_stack		@ the local stack
>>>> +	stmia	r12, {r0-r5, r7, r9-r11, lr}
>>>> +	bl      v7_invalidate_l1
>>>> +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
>>>
>>> Some CPUs such as CA7 need enable SMP before any cache maintenance.
>>>
>>> CA7 TRM says something about SMP bit:
>>> "You must ensure this bit is set to 1 before the caches and MMU are enabled,
>>> or any cache and TLB maintenance operations are performed."
>>
>> Frankly, that's wrong for two reasons.  Think about it for a moment...
>>
>> If the cache contains crap - in other words, it contains random
>> uninitialised data in the cache lines at random locations, some of
>> which are marked valid and some of which are marked dirty - then
>> enabling the SMP bit puts the caches into coherent mode, and they
>> join the coherent cluster.
>>
>> That means those cache lines containing crap become visible to other
>> CPUs in the cluster, and can be migrated to other CPUs, and the crap
>> data in them becomes visible to other CPUs.  This leads to state
>> corruption on other CPUs in the cluster.
>>
>> Moreover, the cache invalidation of the local L1 cache is broadcast
>> to other CPUs in the cluster, and _their_ caches are also invalidated,
>> again, leading to state corruption on already running CPUs.  We don't
>> want the invalidation of the incoming CPU to be broadcast to the other
>> CPUs.
>>
>> This is all round a very bad thing.
>>
>>> Also CA7 would invalidate L1 automatically once reset, can we remove the
>>> invalidate op in CA7 case?
>>
>> No, because we enter this path from multiple different situations, eg,
>> after the decompressor has run, after the boot loader has run which
>> may have enabled caches and not properly invalidated them prior to
>> calling the kernel.
>>
> 
> Got it. Thanks very much for your detailed explanation!
> 

Just wondering if you are still planning to send this patch and if you
need me to do anything to help?

Thanks,
Dinh
Russell King - ARM Linux July 15, 2015, 7:04 p.m. UTC | #6
On Tue, Jul 14, 2015 at 07:15:29AM -0500, Dinh Nguyen wrote:
> Hi Russell,
> 
> Just wondering if you are still planning to send this patch and if you
> need me to do anything to help?

Giving a tested-by tag would help. Thanks. :)
dinguyen@opensource.altera.com July 15, 2015, 7:23 p.m. UTC | #7
On 07/08/2015 04:07 PM, Russell King - ARM Linux wrote:
> On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
>> The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
>> user mode access.
> 
> Hmm.
> 
> I think what you've found is a(nother) latent bug in the CPU bring up
> code.
> 
> For SMP CPUs, the sequence we're following during early initialisation is:
> 
> 1. Enable SMP coherency.
> 2. Invalidate the caches.
> 
> If the cache contains rubbish, enabling SMP coherency before invalidating
> the cache is plainly an absurd thing to do.
> 
> Can you try the patch below - not tested in any way, so you may need to
> tweak it, but it should allow us to prove that point.
> 
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 0716bbe19872..db5137fc297d 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -275,6 +275,10 @@ __v7_b15mp_setup:
>  __v7_ca17mp_setup:
>  	mov	r10, #0
>  1:
> +	adr	r12, __v7_setup_stack		@ the local stack
> +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> +	bl      v7_invalidate_l1
> +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
>  #ifdef CONFIG_SMP
>  	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
>  	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
> @@ -283,7 +287,7 @@ __v7_ca17mp_setup:
>  	orreq	r0, r0, r10			@ Enable CPU-specific SMP bits
>  	mcreq	p15, 0, r0, c1, c0, 1
>  #endif
> -	b	__v7_setup
> +	b	__v7_setup_cont
>  
>  /*
>   * Errata:
> @@ -417,6 +421,7 @@ __v7_setup:
>  	bl      v7_invalidate_l1
>  	ldmia	r12, {r0-r5, r7, r9, r11, lr}
>  
> +__v7_setup_cont:
>  	and	r0, r9, #0xff000000		@ ARM?
>  	teq	r0, #0x41000000
>  	bne	__errata_finish
> @@ -480,7 +485,7 @@ ENDPROC(__v7_setup)
>  
>  	.align	2
>  __v7_setup_stack:
> -	.space	4 * 11				@ 11 registers
> +	.space	4 * 12				@ 12 registers
>  
>  	__INITDATA
>  
> 

For this patch, please feel free to add:

Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com>

Thanks,
Dinh
dinguyen@opensource.altera.com July 15, 2015, 8:11 p.m. UTC | #8
On 07/15/2015 02:04 PM, Russell King - ARM Linux wrote:
> On Tue, Jul 14, 2015 at 07:15:29AM -0500, Dinh Nguyen wrote:
>> Hi Russell,
>>
>> Just wondering if you are still planning to send this patch and if you
>> need me to do anything to help?
> 
> Giving a tested-by tag would help. Thanks. :)
> 

Okay, I've given my Tested-by in a follow-up response that has the patch
still in the email.

Thanks,
Dinh
Steffen Trumtrar July 16, 2015, 4:11 p.m. UTC | #9
Hi, Russell!

On Wed, Jul 15, 2015 at 02:23:52PM -0500, Dinh Nguyen wrote:
> On 07/08/2015 04:07 PM, Russell King - ARM Linux wrote:
> > On Wed, Jul 08, 2015 at 02:13:32PM -0500, Dinh Nguyen wrote:
> >> The value of CPACR is 0x00F00000. So cp11 and cp10 are privileged and
> >> user mode access.
> > 
> > Hmm.
> > 
> > I think what you've found is a(nother) latent bug in the CPU bring up
> > code.
> > 
> > For SMP CPUs, the sequence we're following during early initialisation is:
> > 
> > 1. Enable SMP coherency.
> > 2. Invalidate the caches.
> > 
> > If the cache contains rubbish, enabling SMP coherency before invalidating
> > the cache is plainly an absurd thing to do.
> > 
> > Can you try the patch below - not tested in any way, so you may need to
> > tweak it, but it should allow us to prove that point.
> > 
> > diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> > index 0716bbe19872..db5137fc297d 100644
> > --- a/arch/arm/mm/proc-v7.S
> > +++ b/arch/arm/mm/proc-v7.S
> > @@ -275,6 +275,10 @@ __v7_b15mp_setup:
> >  __v7_ca17mp_setup:
> >  	mov	r10, #0
> >  1:
> > +	adr	r12, __v7_setup_stack		@ the local stack
> > +	stmia	r12, {r0-r5, r7, r9-r11, lr}
> > +	bl      v7_invalidate_l1
> > +	ldmia	r12, {r0-r5, r7, r9-r11, lr}
> >  #ifdef CONFIG_SMP
> >  	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
> >  	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
> > @@ -283,7 +287,7 @@ __v7_ca17mp_setup:
> >  	orreq	r0, r0, r10			@ Enable CPU-specific SMP bits
> >  	mcreq	p15, 0, r0, c1, c0, 1
> >  #endif
> > -	b	__v7_setup
> > +	b	__v7_setup_cont
> >  
> >  /*
> >   * Errata:
> > @@ -417,6 +421,7 @@ __v7_setup:
> >  	bl      v7_invalidate_l1
> >  	ldmia	r12, {r0-r5, r7, r9, r11, lr}
> >  
> > +__v7_setup_cont:
> >  	and	r0, r9, #0xff000000		@ ARM?
> >  	teq	r0, #0x41000000
> >  	bne	__errata_finish
> > @@ -480,7 +485,7 @@ ENDPROC(__v7_setup)
> >  
> >  	.align	2
> >  __v7_setup_stack:
> > -	.space	4 * 11				@ 11 registers
> > +	.space	4 * 12				@ 12 registers
> >  
> >  	__INITDATA
> >  
> > 
> 
> For this patch, please feel free to add:
> 
> Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com>
> 

I just ran into the same problem as Dinh. This patch fixed it for me, too.
Without it, 4.2-rc2 (at least) is pretty broken for me :-(
So, you may also add

	Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>

When you send a proper patch.

Thanks,
Steffen Trumtrar
diff mbox

Patch

diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 0716bbe19872..db5137fc297d 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -275,6 +275,10 @@  __v7_b15mp_setup:
 __v7_ca17mp_setup:
 	mov	r10, #0
 1:
+	adr	r12, __v7_setup_stack		@ the local stack
+	stmia	r12, {r0-r5, r7, r9-r11, lr}
+	bl      v7_invalidate_l1
+	ldmia	r12, {r0-r5, r7, r9-r11, lr}
 #ifdef CONFIG_SMP
 	ALT_SMP(mrc	p15, 0, r0, c1, c0, 1)
 	ALT_UP(mov	r0, #(1 << 6))		@ fake it for UP
@@ -283,7 +287,7 @@  __v7_ca17mp_setup:
 	orreq	r0, r0, r10			@ Enable CPU-specific SMP bits
 	mcreq	p15, 0, r0, c1, c0, 1
 #endif
-	b	__v7_setup
+	b	__v7_setup_cont
 
 /*
  * Errata:
@@ -417,6 +421,7 @@  __v7_setup:
 	bl      v7_invalidate_l1
 	ldmia	r12, {r0-r5, r7, r9, r11, lr}
 
+__v7_setup_cont:
 	and	r0, r9, #0xff000000		@ ARM?
 	teq	r0, #0x41000000
 	bne	__errata_finish
@@ -480,7 +485,7 @@  ENDPROC(__v7_setup)
 
 	.align	2
 __v7_setup_stack:
-	.space	4 * 11				@ 11 registers
+	.space	4 * 12				@ 12 registers
 
 	__INITDATA