ARM: Kirkwood: Add support for Excito Bubba B3

Message ID	20140102230832.GB9339@obsidianresearch.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> Date: Thu, 2 Jan 2014 16:08:32 -0700 From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> To: Ian Campbell <ijc@hellion.org.uk> Subject: Re: [PATCH] ARM: Kirkwood: Add support for Excito Bubba B3 Message-ID: <20140102230832.GB9339@obsidianresearch.com> References: <1388247131-19301-1-git-send-email-andrew@lunn.ch> <20131228170114.GH19878@titan.lakedaemon.net> <20140102194924.GA3321@obsidianresearch.com> <1388702192.16958.54.camel@hastur.hellion.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1388702192.16958.54.camel@hastur.hellion.org.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Gregory Clement <gregory.clement@free-electrons.com>, Andrew Lunn <andrew@lunn.ch>, Sebastian Hesselbarth <sebastian.hesselbarth@googlemail.com>, Jason Cooper <jason@lakedaemon.net>, linux ARM <linux-arm-kernel@lists.infradead.org> Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Jason Gunthorpe Jan. 2, 2014, 11:08 p.m. UTC

On Thu, Jan 02, 2014 at 10:36:32PM +0000, Ian Campbell wrote:
> On Thu, 2014-01-02 at 12:49 -0700, Jason Gunthorpe wrote:
> > On Sat, Dec 28, 2013 at 12:01:14PM -0500, Jason Cooper wrote:
> > 
> > > > + * Note: This requires a new'ish version of u-boot, which disables the
> > > > + * L2 cache. If your B3 silently fails to boot, u-boot is probably too
> > > > + * old. Either upgrade, or consider the following email:
> > > > + *
> > > > + * http://lists.debian.org/debian-arm/2012/08/msg00128.html
> > > 
> > > Nice, thanks for adding this.
> > 
> > Nifty.. But what is the root cause for the above?
> > 
> > I'm guessing that at some point the the L1 icache has been enabled,
> > the L1 dcache disabled, and the L2 cache enabled?
> > 
> > If the pv_fixup runs with both L1 caches off and an empty L2 then it
> > shouldn't cause a problem.
> 
> I'm not 100% sure but I think the issue is that the kernel expects to be
> running with all of the caches off, but u-boot incorrectly left the l2
> on. The L2 is probably not empty (because u-boot was using it).

I think I see the problem now.. The decompressor turns on the L1
dcache, which populates the enabled L2 with write through data. When
the decompressor turns the dcache off again the L2 retains some of the
kernel image and any data writes done in head.S while the dcache is
off could be corrupted if they overlap the L2's retention,
specifically the pv_fixup stuff, but other elements could potentially
be mangled too.

So all Marvell platforms with their L2 cache controller must boot into
the decompressor with the L2 cache completely off, or the kernel needs
additional Marvell L2 specific flushing code prior to enabling the MMU
:(

.. and since the problem only shows obviously with multiplatform
kernels that have been relocated it could be a broader, subtler issue
than just your one board ...

Jason C: What do you think about applying this [untested]?

Or do you think the kernel should support booting with L2 on (this is
just two instructions, but they need to be conditional on the CPU type
:( :( )?

-----
arm: Kirkwood: Log a FW_BUG if the L2 cache is turned on at boot

Linux can't cope if the the L2 cache is enabled at boot. It works
sometimes, if the data writes head.S does during the L1 cache-off
period do not overlap with lines in the L2.

This is unlikely if CONFIG_ARM_PATCH_PHYS_VIRT is used, as the
decompressor will pull a chunk of the kernel code into L2 cache.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Jason

Andrew Lunn Jan. 3, 2014, 12:44 a.m. UTC | #1

> .. and since the problem only shows obviously with multiplatform
> kernels that have been relocated it could be a broader, subtler issue
> than just your one board ...

It is not just this board. It is any board using an old mainline
u-boot. Any such system will not boot with a kernel after 3.1. It was
at that point that the issue was found and fixed in u-boot. My Bubba3
has a u-boot before that point...

> Jason C: What do you think about applying this [untested]?

I'm away for the weekend, so cannot test it.
 
> Or do you think the kernel should support booting with L2 on (this is
> just two instructions, but they need to be conditional on the CPU type
> :( :( )?

It would be good to at least put the fix into the impedance matcher,
so there is no need to mess with the kernel.

   Andrew

Jason Cooper Jan. 10, 2014, 7:20 p.m. UTC | #2

On Thu, Jan 02, 2014 at 04:08:32PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 02, 2014 at 10:36:32PM +0000, Ian Campbell wrote:
> > On Thu, 2014-01-02 at 12:49 -0700, Jason Gunthorpe wrote:
> > > On Sat, Dec 28, 2013 at 12:01:14PM -0500, Jason Cooper wrote:
> > > 
> > > > > + * Note: This requires a new'ish version of u-boot, which disables the
> > > > > + * L2 cache. If your B3 silently fails to boot, u-boot is probably too
> > > > > + * old. Either upgrade, or consider the following email:
> > > > > + *
> > > > > + * http://lists.debian.org/debian-arm/2012/08/msg00128.html
> > > > 
> > > > Nice, thanks for adding this.
> > > 
> > > Nifty.. But what is the root cause for the above?
> > > 
> > > I'm guessing that at some point the the L1 icache has been enabled,
> > > the L1 dcache disabled, and the L2 cache enabled?
> > > 
> > > If the pv_fixup runs with both L1 caches off and an empty L2 then it
> > > shouldn't cause a problem.
> > 
> > I'm not 100% sure but I think the issue is that the kernel expects to be
> > running with all of the caches off, but u-boot incorrectly left the l2
> > on. The L2 is probably not empty (because u-boot was using it).
> 
> I think I see the problem now.. The decompressor turns on the L1
> dcache, which populates the enabled L2 with write through data. When
> the decompressor turns the dcache off again the L2 retains some of the
> kernel image and any data writes done in head.S while the dcache is
> off could be corrupted if they overlap the L2's retention,
> specifically the pv_fixup stuff, but other elements could potentially
> be mangled too.
> 
> So all Marvell platforms with their L2 cache controller must boot into
> the decompressor with the L2 cache completely off, or the kernel needs
> additional Marvell L2 specific flushing code prior to enabling the MMU
> :(
> 
> .. and since the problem only shows obviously with multiplatform
> kernels that have been relocated it could be a broader, subtler issue
> than just your one board ...
> 
> Jason C: What do you think about applying this [untested]?

I'm of the opinion that the kernel shouldn't assume the state of the
system when it begins execution.  Sure it's nice to point fingers at the
bootloader (especially since it's obviously wrong in this situation),
but we need Linux to boot on systems with non-upgradeable bootloaders
which may be broken.  I'm specifically thinking about hobbyists
modifying boxes they bought at $bigbox stores.

So, I'd prefer to handle this more gracefully.  I don't have much
experience at the low-level init of the caches, couldn't we enable and
flush rather than throwing the error?

thx,

Jason (C).

> Or do you think the kernel should support booting with L2 on (this is
> just two instructions, but they need to be conditional on the CPU type
> :( :( )?
> 
> -----
> arm: Kirkwood: Log a FW_BUG if the L2 cache is turned on at boot
> 
> Linux can't cope if the the L2 cache is enabled at boot. It works
> sometimes, if the data writes head.S does during the L1 cache-off
> period do not overlap with lines in the L2.
> 
> This is unlikely if CONFIG_ARM_PATCH_PHYS_VIRT is used, as the
> decompressor will pull a chunk of the kernel code into L2 cache.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> 
> diff --git a/arch/arm/mm/cache-feroceon-l2.c b/arch/arm/mm/cache-feroceon-l2.c
> index 48bc3c0..68781a4 100644
> --- a/arch/arm/mm/cache-feroceon-l2.c
> +++ b/arch/arm/mm/cache-feroceon-l2.c
> @@ -331,7 +331,8 @@ static void __init enable_l2(void)
>                         enable_icache();
>                 if (d)
>                         enable_dcache();
> -       }
> +       } else
> +               pr_err(FW_BUG "Feroceon L2: bootloader left the L2 cache on!\n");
>  }
>  
>  void __init feroceon_l2_init(int __l2_wt_override)
> 
> 
> Jason

Jason Gunthorpe Jan. 10, 2014, 7:44 p.m. UTC | #3

On Fri, Jan 10, 2014 at 02:20:32PM -0500, Jason Cooper wrote:
> > Jason C: What do you think about applying this [untested]?
> 
> I'm of the opinion that the kernel shouldn't assume the state of the
> system when it begins execution.  Sure it's nice to point fingers at the
> bootloader (especially since it's obviously wrong in this situation),
> but we need Linux to boot on systems with non-upgradeable bootloaders
> which may be broken.  I'm specifically thinking about hobbyists
> modifying boxes they bought at $bigbox stores.

I liked the idea of your impedance matcher fixing up bootloader
problems. It would be pretty easy to have a feroceon specific path in
the impedance matcher to disable the L2.

> So, I'd prefer to handle this more gracefully.  I don't have much
> experience at the low-level init of the caches, couldn't we enable and
> flush rather than throwing the error?

It cannot be solved in cache-feroceon-l2.c.

In many cases you won't even get that far:
- It will blow up in head.S when the cache is off: decompressor wrote
  writeback data to into the L2 and uncached fetches see memory
  content prior to decompression
- It will blow up after head.S enables the L1: decompressor wrote
  data into the L2 but the relocation writes done with the L1 off
  made changes to memory that were not captured in the L2.

So to solve it gracefully it has to be done before the decompressor
enables the L1. Which leaves the bootloader, impedance matcher or
decompressor startup code as possible places.

The code required is a flush invalidate and then disable of the L2
cache, which is only about 5 instructions, but they must be
conditionalized somehow on the CPU type :(

The purpose of my patch is to highlight the problem to the bootloader
people. If you luck out and successfully boot with the L2 cache
enabled you can at least see the message that the bootloader is wrong.

Jason

Jason Cooper Jan. 10, 2014, 7:54 p.m. UTC | #4

On Fri, Jan 10, 2014 at 12:44:37PM -0700, Jason Gunthorpe wrote:
> On Fri, Jan 10, 2014 at 02:20:32PM -0500, Jason Cooper wrote:
> > > Jason C: What do you think about applying this [untested]?
> > 
> > I'm of the opinion that the kernel shouldn't assume the state of the
> > system when it begins execution.  Sure it's nice to point fingers at the
> > bootloader (especially since it's obviously wrong in this situation),
> > but we need Linux to boot on systems with non-upgradeable bootloaders
> > which may be broken.  I'm specifically thinking about hobbyists
> > modifying boxes they bought at $bigbox stores.
> 
> I liked the idea of your impedance matcher fixing up bootloader
> problems. It would be pretty easy to have a feroceon specific path in
> the impedance matcher to disable the L2.

Agreed.

> > So, I'd prefer to handle this more gracefully.  I don't have much
> > experience at the low-level init of the caches, couldn't we enable and
> > flush rather than throwing the error?
> 
> It cannot be solved in cache-feroceon-l2.c.
> 
> In many cases you won't even get that far:
> - It will blow up in head.S when the cache is off: decompressor wrote
>   writeback data to into the L2 and uncached fetches see memory
>   content prior to decompression
> - It will blow up after head.S enables the L1: decompressor wrote
>   data into the L2 but the relocation writes done with the L1 off
>   made changes to memory that were not captured in the L2.
> 
> So to solve it gracefully it has to be done before the decompressor
> enables the L1. Which leaves the bootloader, impedance matcher or
> decompressor startup code as possible places.
> 
> The code required is a flush invalidate and then disable of the L2
> cache, which is only about 5 instructions, but they must be
> conditionalized somehow on the CPU type :(

And there's the problem.  Even at my young LKML age, I know that's a
non-starter.  :)

> The purpose of my patch is to highlight the problem to the bootloader
> people. If you luck out and successfully boot with the L2 cache
> enabled you can at least see the message that the bootloader is wrong.

Sounds sane to me, feel free to add my Acked-by and put it in Russell's
patch tracker.  I'd also ask that you add the above summary to the
commit message.  I'm sure Russell doesn't need it, but it'll help
someone doing a 'git blame ...' down the road.

thx,

Jason.

Andrew Lunn Jan. 10, 2014, 11:02 p.m. UTC | #5

> > So, I'd prefer to handle this more gracefully.  I don't have much
> > experience at the low-level init of the caches, couldn't we enable and
> > flush rather than throwing the error?
> 
> It cannot be solved in cache-feroceon-l2.c.
> 
> In many cases you won't even get that far:
> - It will blow up in head.S when the cache is off: decompressor wrote
>   writeback data to into the L2 and uncached fetches see memory
>   content prior to decompression
> - It will blow up after head.S enables the L1: decompressor wrote
>   data into the L2 but the relocation writes done with the L1 off
>   made changes to memory that were not captured in the L2.

My impression with booting a lot of times while doing i bisect, was
that it got as far as:

Uncompressing Linux... done, booting the kernel.
Booting Linux on physical CPU 0x0
Linux version 3.13.0-rc1-dirty (lunn@laptop) (gcc version 4.4.5 (Debian 4.4.5-8) ) #7 PREEMPT Fri Jan 3 09:25:07 CST 2014           
CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
CPU: VIVT data cache, VIVT instruction cache
Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family
Memory policy: ECC disabled, Data cache writeback

every time.

It frequently got as far as

Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 130048
Kernel command line: root=/dev/sda2 console=ttyS0,115200 earlyprintk
PID hash table entries: 2048 (order: 1, 8192 bytes)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 512704K/524288K available (4693K kernel code, 247K rwdata, 1264K rodata, 154K init, 616K bss, 11584K reserved)              
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xe0800000 - 0xff000000   ( 488 MB)
    lowmem  : 0xc0000000 - 0xe0000000   ( 512 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .text : 0xc0008000 - 0xc05d9678   (5958 kB)
      .init : 0xc05da000 - 0xc0600ab0   ( 155 kB)
      .data : 0xc0602000 - 0xc063fcc0   ( 248 kB)
       .bss : 0xc063fccc - 0xc06d9ef0   ( 617 kB)
SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Preemptible hierarchical RCU implementation.
NR_IRQS:114  
sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 21474ms

And never made it past

Console: colour dummy device 80x30
Calibrating delay loop... 1587.60 BogoMIPS (lpj=7938048)

with L2 enabled.

Now clearly there is a difference between how far it got before
everything stops, and the point where corruption has occurred and
unavoidable death is going to happen sometime in the future.

It does however seem possible to insert platform specific code soon
after it prints:

Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family

but i never got around to trying to disable and flush L2 at that
point. And the hardware is now also on the other side of the pond, and
i have no way to remotely power cycle it.

  Andrew

Jason Gunthorpe Jan. 11, 2014, 12:19 a.m. UTC | #6

On Sat, Jan 11, 2014 at 12:02:48AM +0100, Andrew Lunn wrote:
> > > So, I'd prefer to handle this more gracefully.  I don't have much
> > > experience at the low-level init of the caches, couldn't we enable and
> > > flush rather than throwing the error?
> > 
> > It cannot be solved in cache-feroceon-l2.c.
> > 
> > In many cases you won't even get that far:
> > - It will blow up in head.S when the cache is off: decompressor wrote
> >   writeback data to into the L2 and uncached fetches see memory
> >   content prior to decompression
> > - It will blow up after head.S enables the L1: decompressor wrote
> >   data into the L2 but the relocation writes done with the L1 off
> >   made changes to memory that were not captured in the L2.
> 
> My impression with booting a lot of times while doing i bisect, was
> that it got as far as:
> 
> Uncompressing Linux... done, booting the kernel.
> Booting Linux on physical CPU 0x0
> Linux version 3.13.0-rc1-dirty (lunn@laptop) (gcc version 4.4.5 (Debian 4.4.5-8) ) #7 PREEMPT Fri Jan 3 09:25:07 CST 2014           
> CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053977
> CPU: VIVT data cache, VIVT instruction cache
> Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family
> Memory policy: ECC disabled, Data cache writeback
> 
> every time.

It should be deterministic, but 'random' in the sense it is sensitive
to exactly the data pattern generated by the decompressor and exactly
the data pattern generated by the relocation writes - so the
corruption will vary depending on the kernel image content (position
of the relocations) and also the size of the L2.

You are likely seeing the 2nd failure I predicted, which is a failure
when a cache line containing a relocation is used, and happened to be
in the L2.

The first failure would require the kernel image to be smaller than
the cache or the cache eviction policy to be such that the initial
part of the decompressed image remains dirty and unevicted in the
L2.

> It does however seem possible to insert platform specific code soon
> after it prints:
> 
> Machine: Marvell Kirkwood (Flattened Device Tree), model: QNAP TS219 family

So if you attempt to do the flush/disable at this point you are
relying on the L2 containing no dirty lines that overlap with lines
that were changed by the relocator while the cache was off.

This is *probably* true in many cases, but it is sktechy, and creates
a hidden hard to find pitfall. This is basically the same 'probably'
that enables non-relocated kernels to boot.

Basically, you might get a bootable system, but it is technically
wrong and has a theoretical hole.

Jason

Russell King - ARM Linux Jan. 11, 2014, 3:59 p.m. UTC | #7

On Fri, Jan 10, 2014 at 02:20:32PM -0500, Jason Cooper wrote:
> I'm of the opinion that the kernel shouldn't assume the state of the
> system when it begins execution.

Rubbish.  The kernel's requirements are well documented, and have been for
years.  The reason we've documented it is because it _matters_, and it's
hard to sort out this crap before the kernel starts.

Think about it - think about how many different L2 caches there are out
there.

Think about how difficult it is to locate the L2 cache controller when
you don't know what platform you're running on.

Think about how difficult it is to disable the L2 cache.

Think about how difficult it is to flush the L2 cache, when you don't
know what sort it is, where it's located, or what instructions to use
to do that.

The kernel cares about its boot environment beacuse of these issues.

Jason Cooper Jan. 13, 2014, 2:44 p.m. UTC | #8

On Sat, Jan 11, 2014 at 03:59:00PM +0000, Russell King - ARM Linux wrote:
> On Fri, Jan 10, 2014 at 02:20:32PM -0500, Jason Cooper wrote:
> > I'm of the opinion that the kernel shouldn't assume the state of the
> > system when it begins execution.
> 
> Rubbish.  The kernel's requirements are well documented, and have been for
> years.  The reason we've documented it is because it _matters_, and it's
> hard to sort out this crap before the kernel starts.
> 
> Think about it - think about how many different L2 caches there are out
> there.
> 
> Think about how difficult it is to locate the L2 cache controller when
> you don't know what platform you're running on.
> 
> Think about how difficult it is to disable the L2 cache.
> 
> Think about how difficult it is to flush the L2 cache, when you don't
> know what sort it is, where it's located, or what instructions to use
> to do that.
> 
> The kernel cares about its boot environment beacuse of these issues.

Agreed.  My failure was applying the thought process for drivers (eg
mv643xx_eth) to the core kernel code.  With a network driver we are
often in a situation where sometimes the clocks are gated, sometimes
they aren't.

For example, doing tftp boot during development, and nandboot in
production.  In that case, sometimes the bootloader sets things up and
sometimes it doesn't depending on the bootloader environment.  The
mv643xx_eth driver can't assume the state that's handed to it.

But as you and JasonG have clearly explained, that doesn't carry weight
so early in the boot process for all the reasons you mentioned.

thx,

Jason.

ARM: Kirkwood: Add support for Excito Bubba B3

Commit Message

Comments

Patch