diff mbox

[RFC] ARM: Flush L2 cache on soft_restart

Message ID 1380713656-24752-1-git-send-email-taras.kondratiuk@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Taras Kondratiuk Oct. 2, 2013, 11:34 a.m. UTC
Kexec disables L2 cache before jumping to reboot code,
but it doesn't flush it. So often just copied reboot code
gets corrupted, because part of it is stored in L2 cache
and have not reached memory.

Flushing cache prevents this corruption.

I'm facing this issue on Pandaboard ES, but it looks like
similar issue is observed on TC2 board [1].

[1] http://www.spinics.net/lists/arm-kernel/msg264339.html

Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
---
 arch/arm/kernel/process.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Will Deacon Oct. 2, 2013, 12:49 p.m. UTC | #1
On Wed, Oct 02, 2013 at 12:34:16PM +0100, Taras Kondratiuk wrote:
> Kexec disables L2 cache before jumping to reboot code,
> but it doesn't flush it. So often just copied reboot code
> gets corrupted, because part of it is stored in L2 cache
> and have not reached memory.
> 
> Flushing cache prevents this corruption.
> 
> I'm facing this issue on Pandaboard ES, but it looks like
> similar issue is observed on TC2 board [1].

TC2 doesn't have an outer cache, so that report is not relevant to this
patch.

> [1] http://www.spinics.net/lists/arm-kernel/msg264339.html
> 
> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
> ---
>  arch/arm/kernel/process.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 94f6b05..e359b62 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -103,9 +103,11 @@ void soft_restart(unsigned long addr)
>  	local_irq_disable();
>  	local_fiq_disable();
>  
> -	/* Disable the L2 if we're the last man standing. */
> -	if (num_online_cpus() == 1)
> +	/* Flush and disable the L2 if we're the last man standing. */
> +	if (num_online_cpus() == 1) {
> +		outer_flush_all();
>  		outer_disable();

l2x0_disable already contains a flush, so this doesn't change anything.

Will
Taras Kondratiuk Oct. 2, 2013, 5:19 p.m. UTC | #2
On 2 October 2013 15:49, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Oct 02, 2013 at 12:34:16PM +0100, Taras Kondratiuk wrote:
>> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>> index 94f6b05..e359b62 100644
>> --- a/arch/arm/kernel/process.c
>> +++ b/arch/arm/kernel/process.c
>> @@ -103,9 +103,11 @@ void soft_restart(unsigned long addr)
>>       local_irq_disable();
>>       local_fiq_disable();
>>
>> -     /* Disable the L2 if we're the last man standing. */
>> -     if (num_online_cpus() == 1)
>> +     /* Flush and disable the L2 if we're the last man standing. */
>> +     if (num_online_cpus() == 1) {
>> +             outer_flush_all();
>>               outer_disable();
>
> l2x0_disable already contains a flush, so this doesn't change anything.

Unfortunately not everybody uses l2x0_disable().
SoC's that use SMC calls for L2 cache maintenance have its own implementation
of outer_cache.disable which usually doesn't flush cache implicitly.
Will Deacon Oct. 2, 2013, 5:31 p.m. UTC | #3
On Wed, Oct 02, 2013 at 06:19:30PM +0100, Taras Kondratiuk wrote:
> On 2 October 2013 15:49, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Oct 02, 2013 at 12:34:16PM +0100, Taras Kondratiuk wrote:
> >> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> >> index 94f6b05..e359b62 100644
> >> --- a/arch/arm/kernel/process.c
> >> +++ b/arch/arm/kernel/process.c
> >> @@ -103,9 +103,11 @@ void soft_restart(unsigned long addr)
> >>       local_irq_disable();
> >>       local_fiq_disable();
> >>
> >> -     /* Disable the L2 if we're the last man standing. */
> >> -     if (num_online_cpus() == 1)
> >> +     /* Flush and disable the L2 if we're the last man standing. */
> >> +     if (num_online_cpus() == 1) {
> >> +             outer_flush_all();
> >>               outer_disable();
> >
> > l2x0_disable already contains a flush, so this doesn't change anything.
> 
> Unfortunately not everybody uses l2x0_disable().
> SoC's that use SMC calls for L2 cache maintenance have its own implementation
> of outer_cache.disable which usually doesn't flush cache implicitly.

In which case, we should probably fix the disabling code to make a flush
then update callers not to bother with redundant flushing. The flushing
during the disable code is likely required anyway if there's any
synchronisation going on.

Will
Russell King - ARM Linux Oct. 2, 2013, 9:01 p.m. UTC | #4
On Wed, Oct 02, 2013 at 01:49:03PM +0100, Will Deacon wrote:
> On Wed, Oct 02, 2013 at 12:34:16PM +0100, Taras Kondratiuk wrote:
> > Kexec disables L2 cache before jumping to reboot code,
> > but it doesn't flush it. So often just copied reboot code
> > gets corrupted, because part of it is stored in L2 cache
> > and have not reached memory.
> > 
> > Flushing cache prevents this corruption.
> > 
> > I'm facing this issue on Pandaboard ES, but it looks like
> > similar issue is observed on TC2 board [1].
> 
> TC2 doesn't have an outer cache, so that report is not relevant to this
> patch.
> 
> > [1] http://www.spinics.net/lists/arm-kernel/msg264339.html
> > 
> > Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
> > ---
> >  arch/arm/kernel/process.c |    6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> > index 94f6b05..e359b62 100644
> > --- a/arch/arm/kernel/process.c
> > +++ b/arch/arm/kernel/process.c
> > @@ -103,9 +103,11 @@ void soft_restart(unsigned long addr)
> >  	local_irq_disable();
> >  	local_fiq_disable();
> >  
> > -	/* Disable the L2 if we're the last man standing. */
> > -	if (num_online_cpus() == 1)
> > +	/* Flush and disable the L2 if we're the last man standing. */
> > +	if (num_online_cpus() == 1) {
> > +		outer_flush_all();
> >  		outer_disable();
> 
> l2x0_disable already contains a flush, so this doesn't change anything.

The disable call _has_ to contain the flush as that's the only way it
can be disabled to avoid any loss of data when the outer cache stops
being searched for cache hits.
Taras Kondratiuk Oct. 3, 2013, 10:32 p.m. UTC | #5
On 2 October 2013 20:31, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Oct 02, 2013 at 06:19:30PM +0100, Taras Kondratiuk wrote:
>> On 2 October 2013 15:49, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Oct 02, 2013 at 12:34:16PM +0100, Taras Kondratiuk wrote:
>> >> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
>> >> index 94f6b05..e359b62 100644
>> >> --- a/arch/arm/kernel/process.c
>> >> +++ b/arch/arm/kernel/process.c
>> >> @@ -103,9 +103,11 @@ void soft_restart(unsigned long addr)
>> >>       local_irq_disable();
>> >>       local_fiq_disable();
>> >>
>> >> -     /* Disable the L2 if we're the last man standing. */
>> >> -     if (num_online_cpus() == 1)
>> >> +     /* Flush and disable the L2 if we're the last man standing. */
>> >> +     if (num_online_cpus() == 1) {
>> >> +             outer_flush_all();
>> >>               outer_disable();
>> >
>> > l2x0_disable already contains a flush, so this doesn't change anything.
>>
>> Unfortunately not everybody uses l2x0_disable().
>> SoC's that use SMC calls for L2 cache maintenance have its own implementation
>> of outer_cache.disable which usually doesn't flush cache implicitly.
>
> In which case, we should probably fix the disabling code to make a flush
> then update callers not to bother with redundant flushing. The flushing
> during the disable code is likely required anyway if there's any
> synchronisation going on.

It makes sense, but a history of the current code looks a bit confusing.
Implicit flush was added in l2x0_disable() as a "side effect" of
commit 38a8914f9ac2379293944f613e6ca24b61373de8
"ARM: 6987/1: l2x0: fix disabling function to avoid deadlock",
while initially disable was just a disable.

Maybe it worth to explicitly document that .disable callback must flush cache?
diff mbox

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 94f6b05..e359b62 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -103,9 +103,11 @@  void soft_restart(unsigned long addr)
 	local_irq_disable();
 	local_fiq_disable();
 
-	/* Disable the L2 if we're the last man standing. */
-	if (num_online_cpus() == 1)
+	/* Flush and disable the L2 if we're the last man standing. */
+	if (num_online_cpus() == 1) {
+		outer_flush_all();
 		outer_disable();
+	}
 
 	/* Change to the new stack and continue with the reset. */
 	call_with_stack(__soft_restart, (void *)addr, (void *)stack);