diff mbox series

RFC: ARM: do not handle Spectre V2 on Vexpress CA9

Message ID 20181101125406.8874-1-linus.walleij@linaro.org (mailing list archive)
State RFC
Headers show
Series RFC: ARM: do not handle Spectre V2 on Vexpress CA9 | expand

Commit Message

Linus Walleij Nov. 1, 2018, 12:54 p.m. UTC
Since the introduction of the Spectre V2 fixes for ARMv7,
especially the BPIALL workaround, the Versatile Express CA9
with its fragile CA9 core tile simply doesn't boot for me
anymore.

If I turn on low level debugging the boot log stops short
at:

smp: Bringing up secondary CPUs ...
GIC: PPI13 is secure or misconfigured
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
CPU1: Spectre v2: using BPIALL workaround
GIC: PPI13 is secure or misconfigured
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
CPU1: Spectre v2: using BPIALL workaround

This is pretty much consistent behaviour, I think it managed
to boot at one point but these fixes are definately rubbing
this CPU the wrong way.

This (not elegant) workaround tries to work around it by
simply not applying any Spectre v2 fixes on the ARM
Vexpress CA9. My other A9 platforms seem to work fine
with the fixes, so this appears to be related to the
fragility of the core tile on this one reference design,
so maybe it would be acceptable to mitigate it like this.

I don't know how much this could be related to my particular
specimen, it would be great if others with this machine
could verify the problem.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 arch/arm/mm/proc-v7-bugs.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Robin Murphy Nov. 1, 2018, 1:56 p.m. UTC | #1
Hi Linus,

On 01/11/2018 12:54, Linus Walleij wrote:
> Since the introduction of the Spectre V2 fixes for ARMv7,
> especially the BPIALL workaround, the Versatile Express CA9
> with its fragile CA9 core tile simply doesn't boot for me
> anymore.
> 
> If I turn on low level debugging the boot log stops short
> at:
> 
> smp: Bringing up secondary CPUs ...
> GIC: PPI13 is secure or misconfigured
> CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> CPU1: Spectre v2: using BPIALL workaround
> GIC: PPI13 is secure or misconfigured
> CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> CPU1: Spectre v2: using BPIALL workaround
> 
> This is pretty much consistent behaviour, I think it managed
> to boot at one point but these fixes are definately rubbing
> this CPU the wrong way.
> 
> This (not elegant) workaround tries to work around it by
> simply not applying any Spectre v2 fixes on the ARM
> Vexpress CA9. My other A9 platforms seem to work fine
> with the fixes, so this appears to be related to the
> fragility of the core tile on this one reference design,
> so maybe it would be acceptable to mitigate it like this.
> 
> I don't know how much this could be related to my particular
> specimen, it would be great if others with this machine
> could verify the problem.

As far as I remember, the usual V2P_CA9 problem where the L2 cache locks 
up for unfathomable reasons can come and go reasonably consistently 
depending on the exact kernel configuration/version, and ultimately 
seems like it might be more sensitive to the general shape and timing of 
the early boot code than any particular feature. As such I'd be wary 
that there might still be some other combinations of options which would 
manage to boot with the mitigation left enabled, and others which won't 
even if (or at worst *because*) we do skip it.

Robin.

> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>   arch/arm/mm/proc-v7-bugs.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c
> index 5544b82a2e7a..10b9eedc3b37 100644
> --- a/arch/arm/mm/proc-v7-bugs.c
> +++ b/arch/arm/mm/proc-v7-bugs.c
> @@ -3,6 +3,7 @@
>   #include <linux/kernel.h>
>   #include <linux/psci.h>
>   #include <linux/smp.h>
> +#include <linux/of.h>
>   
>   #include <asm/cp15.h>
>   #include <asm/cputype.h>
> @@ -42,6 +43,11 @@ static void cpu_v7_spectre_init(void)
>   	const char *spectre_v2_method = NULL;
>   	int cpu = smp_processor_id();
>   
> +	if (of_machine_is_compatible("arm,vexpress,v2p-ca9")) {
> +		pr_err("CPU%u: Spectre v2: can't handle mitigations, CPU is vulnerable\n", cpu);
> +		return;
> +	}
> +
>   	if (per_cpu(harden_branch_predictor_fn, cpu))
>   		return;
>   
>
Sudeep Holla Nov. 1, 2018, 2:01 p.m. UTC | #2
On Thu, Nov 01, 2018 at 01:56:41PM +0000, Robin Murphy wrote:
> Hi Linus,
> 
> On 01/11/2018 12:54, Linus Walleij wrote:
> > Since the introduction of the Spectre V2 fixes for ARMv7,
> > especially the BPIALL workaround, the Versatile Express CA9
> > with its fragile CA9 core tile simply doesn't boot for me
> > anymore.
> > 
> > If I turn on low level debugging the boot log stops short
> > at:
> > 
> > smp: Bringing up secondary CPUs ...
> > GIC: PPI13 is secure or misconfigured
> > CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > CPU1: Spectre v2: using BPIALL workaround
> > GIC: PPI13 is secure or misconfigured
> > CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> > CPU1: Spectre v2: using BPIALL workaround
> > 
> > This is pretty much consistent behaviour, I think it managed
> > to boot at one point but these fixes are definately rubbing
> > this CPU the wrong way.
> > 
> > This (not elegant) workaround tries to work around it by
> > simply not applying any Spectre v2 fixes on the ARM
> > Vexpress CA9. My other A9 platforms seem to work fine
> > with the fixes, so this appears to be related to the
> > fragility of the core tile on this one reference design,
> > so maybe it would be acceptable to mitigate it like this.
> > 
> > I don't know how much this could be related to my particular
> > specimen, it would be great if others with this machine
> > could verify the problem.
> 
> As far as I remember, the usual V2P_CA9 problem where the L2 cache locks up
> for unfathomable reasons can come and go reasonably consistently depending
> on the exact kernel configuration/version, and ultimately seems like it
> might be more sensitive to the general shape and timing of the early boot
> code than any particular feature. As such I'd be wary that there might still
> be some other combinations of options which would manage to boot with the
> mitigation left enabled, and others which won't even if (or at worst
> *because*) we do skip it.

Indeed, I was about ask if disabling L2 makes it any consistent.
We did try increasing the read/write/tag latencies in the past but never
got it fully consistent to push the change.

--
Regards,
Sudeep
Linus Walleij Nov. 1, 2018, 3:11 p.m. UTC | #3
On Thu, Nov 1, 2018 at 3:01 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Thu, Nov 01, 2018 at 01:56:41PM +0000, Robin Murphy wrote:

> > As far as I remember, the usual V2P_CA9 problem where the L2 cache locks up
> > for unfathomable reasons can come and go reasonably consistently depending
> > on the exact kernel configuration/version, and ultimately seems like it
> > might be more sensitive to the general shape and timing of the early boot
> > code than any particular feature. As such I'd be wary that there might still
> > be some other combinations of options which would manage to boot with the
> > mitigation left enabled, and others which won't even if (or at worst
> > *because*) we do skip it.
>
> Indeed, I was about ask if disabling L2 makes it any consistent.
> We did try increasing the read/write/tag latencies in the past but never
> got it fully consistent to push the change.

Yeah I have a patch increasing tag latencies in my tree that I got from
Russell, but it doesn't help with this.

I will try to run it with L2 disabled, if that works maybe the solution is
simply to permanently disable L2 on Vexpress CA9 if it's this
unpredictable, or are there people using this reference design in
performance-critical setups so that is an absolute nono?

I remember I even saw Android ported to this machine at one point,
is that something that is being continously maintained to newer
kernels and Adroid AOSPs?

Yours,
Linus Walleij
Russell King (Oracle) Nov. 1, 2018, 5:25 p.m. UTC | #4
On Thu, Nov 01, 2018 at 01:56:41PM +0000, Robin Murphy wrote:
> Hi Linus,
> 
> On 01/11/2018 12:54, Linus Walleij wrote:
> >Since the introduction of the Spectre V2 fixes for ARMv7,
> >especially the BPIALL workaround, the Versatile Express CA9
> >with its fragile CA9 core tile simply doesn't boot for me
> >anymore.
> >
> >If I turn on low level debugging the boot log stops short
> >at:
> >
> >smp: Bringing up secondary CPUs ...
> >GIC: PPI13 is secure or misconfigured
> >CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >CPU1: Spectre v2: using BPIALL workaround
> >GIC: PPI13 is secure or misconfigured
> >CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
> >CPU1: Spectre v2: using BPIALL workaround
> >
> >This is pretty much consistent behaviour, I think it managed
> >to boot at one point but these fixes are definately rubbing
> >this CPU the wrong way.
> >
> >This (not elegant) workaround tries to work around it by
> >simply not applying any Spectre v2 fixes on the ARM
> >Vexpress CA9. My other A9 platforms seem to work fine
> >with the fixes, so this appears to be related to the
> >fragility of the core tile on this one reference design,
> >so maybe it would be acceptable to mitigate it like this.
> >
> >I don't know how much this could be related to my particular
> >specimen, it would be great if others with this machine
> >could verify the problem.
> 
> As far as I remember, the usual V2P_CA9 problem where the L2 cache locks up
> for unfathomable reasons can come and go reasonably consistently depending
> on the exact kernel configuration/version, and ultimately seems like it
> might be more sensitive to the general shape and timing of the early boot
> code than any particular feature. As such I'd be wary that there might still
> be some other combinations of options which would manage to boot with the
> mitigation left enabled, and others which won't even if (or at worst
> *because*) we do skip it.

There are - for the Versatile Express CT9x4 in my build/boot system,
it's normal for the specific config for that system to pass, but it
fails when combined with other platforms - merely due to a different
kernel memory layout.

It's also kernel version dependent - there's been versions where
neither configuration has booted.

The platform is basically unreliable - it's not called a "test chip"
for no reason!
Linus Walleij Nov. 1, 2018, 9:36 p.m. UTC | #5
On Thu, Nov 1, 2018 at 6:25 PM Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:

> The platform is basically unreliable - it's not called a "test chip"
> for no reason!

I think you're right. Since I have qemu working for it, I just stick
with that in my development for now, and do the odd test on hardware
on the days it happens to be working. I guess this patch doesn't
help in general, it just helped today.

Yours,
Linus Walleij
diff mbox series

Patch

diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c
index 5544b82a2e7a..10b9eedc3b37 100644
--- a/arch/arm/mm/proc-v7-bugs.c
+++ b/arch/arm/mm/proc-v7-bugs.c
@@ -3,6 +3,7 @@ 
 #include <linux/kernel.h>
 #include <linux/psci.h>
 #include <linux/smp.h>
+#include <linux/of.h>
 
 #include <asm/cp15.h>
 #include <asm/cputype.h>
@@ -42,6 +43,11 @@  static void cpu_v7_spectre_init(void)
 	const char *spectre_v2_method = NULL;
 	int cpu = smp_processor_id();
 
+	if (of_machine_is_compatible("arm,vexpress,v2p-ca9")) {
+		pr_err("CPU%u: Spectre v2: can't handle mitigations, CPU is vulnerable\n", cpu);
+		return;
+	}
+
 	if (per_cpu(harden_branch_predictor_fn, cpu))
 		return;