diff mbox series

arm64: invalidate TLB before turning MMU on

Message ID 20181213052259.56352-1-cai@lca.pw (mailing list archive)
State New, archived
Headers show
Series arm64: invalidate TLB before turning MMU on | expand

Commit Message

Qian Cai Dec. 13, 2018, 5:22 a.m. UTC
On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
dump just hung. It has 4 threads on each core. Each 2-core share a same
L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
L3 cache.

It turned out that this was due to the TLB contained stale entries (or
uninitialized junk which just happened to look valid) from the first
kernel before turning the MMU on in the second kernel which caused this
instruction hung,

msr	sctlr_el1, x0

Signed-off-by: Qian Cai <cai@lca.pw>
---
 arch/arm64/kernel/head.S | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Bhupesh Sharma Dec. 13, 2018, 5:40 a.m. UTC | #1
Hi Qian Cai,

On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
>
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
>
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,
>
> msr     sctlr_el1, x0
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  arch/arm64/kernel/head.S | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>         msr     ttbr0_el1, x2                   // load TTBR0
>         msr     ttbr1_el1, x1                   // load TTBR1
>         isb
> +       dsb     nshst
> +       tlbi    vmalle1                         // invalidate TLB
> +       dsb     nsh
> +       isb

This will be executed both for the primary and kdump kernel, right? I
don't think we really want to invalidate the TLB when booting the
primary kernel.
It would be too slow and considering that we need to minimize boot
timings on embedded arm64 devices, I think it would not be a good
idea.

>         msr     sctlr_el1, x0
>         isb
>         /*
> --
> 2.17.2 (Apple Git-113)
>

Also did you check this issue I reported on the HPE apollo machines
some days back with the kdump kernel boot
<https://www.spinics.net/lists/kexec/msg21750.html>.
Can you please confirm that you are not facing the same issue (as I
suspect from reading your earlier Bug Report) on the HPE apollo
machine. Also adding 'earlycon' to the bootargs being passed to the
kdump kernel you can see if you are able to atleast get some console
output from the kdump kernel.

Thanks,
Bhupesh
James Morse Dec. 13, 2018, 10:44 a.m. UTC | #2
Hi Qian,

On 13/12/2018 05:22, Qian Cai wrote:
> On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> dump just hung. It has 4 threads on each core. Each 2-core share a same
> L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> L3 cache.
> 
> It turned out that this was due to the TLB contained stale entries (or
> uninitialized junk which just happened to look valid) from the first
> kernel before turning the MMU on in the second kernel which caused this
> instruction hung,

This is a great find, thanks for debugging this!

The kernel should already handle this, as we don't trust the bootloader to clean
up either.

In arch/arm64/mm/proc.S::__cpu_setup()
|/*
| *	__cpu_setup
| *
| *	Initialise the processor for turning the MMU on.  Return in x0 the
| *	value of the SCTLR_EL1 register.
| */
| 	.pushsection ".idmap.text", "awx"
| ENTRY(__cpu_setup)
| 	tlbi	vmalle1				// Invalidate local TLB
| 	dsb	nsh

This is called from stext, which then branches to __primary_switch(), which
calls __enable_mmu() where you see this problem. It shouldn't not be possible to
allocate new tlb entries between these points...

Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
called twice, the extra tlb maintenance is in __primary_switch.
(if it works with this turned off, it points to the extra off/tlbi/on sequence).


> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 4471f570a295..5196f3d729de 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
>  	msr	ttbr0_el1, x2			// load TTBR0
>  	msr	ttbr1_el1, x1			// load TTBR1
>  	isb
> +	dsb	nshst
> +	tlbi	vmalle1				// invalidate TLB
> +	dsb	nsh
> +	isb
>  	msr	sctlr_el1, x0
>  	isb

The overall change here is that we do extra maintenance later.

Can move this around to bisect where the TLB entries are either coming from, or
failing-to-be invalidated?
Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?
As a stab in the dark, (totally untested):
------------------------------%<------------------------------
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 2c75b0b903ae..a5f3b7314bda 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
  */
        .pushsection ".idmap.text", "awx"
 ENTRY(__cpu_setup)
-       tlbi    vmalle1                         // Invalidate local TLB
-       dsb     nsh
-
        mov     x0, #3 << 20
        msr     cpacr_el1, x0                   // Enable FP/ASIMD
        mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
@@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
 1:
 #endif /* CONFIG_ARM64_HW_AFDBM */
        msr     tcr_el1, x10
+       isb
+
+       tlbi    vmalle1                         // Invalidate local TLB
+       dsb     nsh
+
        ret                                     // return to head.S
 ENDPROC(__cpu_setup)
------------------------------%<------------------------------


Thanks,

James
Qian Cai Dec. 13, 2018, 1:39 p.m. UTC | #3
On Thu, 2018-12-13 at 11:10 +0530, Bhupesh Sharma wrote:
> Hi Qian Cai,
> 
> On Thu, Dec 13, 2018 at 10:53 AM Qian Cai <cai@lca.pw> wrote:
> > 
> > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash
> > dump just hung. It has 4 threads on each core. Each 2-core share a same
> > L1 and L2 caches, so that is 8 CPUs shares those. All CPUs share a same
> > L3 cache.
> > 
> > It turned out that this was due to the TLB contained stale entries (or
> > uninitialized junk which just happened to look valid) from the first
> > kernel before turning the MMU on in the second kernel which caused this
> > instruction hung,
> > 
> > msr     sctlr_el1, x0
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> >  arch/arm64/kernel/head.S | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >         msr     ttbr0_el1, x2                   // load TTBR0
> >         msr     ttbr1_el1, x1                   // load TTBR1
> >         isb
> > +       dsb     nshst
> > +       tlbi    vmalle1                         // invalidate TLB
> > +       dsb     nsh
> > +       isb
> 
> This will be executed both for the primary and kdump kernel, right? I
> don't think we really want to invalidate the TLB when booting the
> primary kernel.
> It would be too slow and considering that we need to minimize boot
> timings on embedded arm64 devices, I think it would not be a good
> idea.

Yes, it will be executed for the first kernel as well. As James mentioned, it
needs to be done to invalidate TLB that might be used by bootloader anyway.

> 
> >         msr     sctlr_el1, x0
> >         isb
> >         /*
> > --
> > 2.17.2 (Apple Git-113)
> > 
> 
> Also did you check this issue I reported on the HPE apollo machines
> some days back with the kdump kernel boot
> <https://www.spinics.net/lists/kexec/msg21750.html>.
> Can you please confirm that you are not facing the same issue (as I
> suspect from reading your earlier Bug Report) on the HPE apollo
> machine. Also adding 'earlycon' to the bootargs being passed to the
> kdump kernel you can see if you are able to atleast get some console
> output from the kdump kernel.

No, here did not encounter the problem you mentioned.
Qian Cai Dec. 13, 2018, 1:44 p.m. UTC | #4
On Thu, 2018-12-13 at 10:44 +0000, James Morse wrote:
> The kernel should already handle this, as we don't trust the bootloader to
> clean
> up either.
> 
> In arch/arm64/mm/proc.S::__cpu_setup()
> > /*
> > *	__cpu_setup
> > *
> > *	Initialise the processor for turning the MMU on.  Return in x0 the
> > *	value of the SCTLR_EL1 register.
> > */
> > 	.pushsection ".idmap.text", "awx"
> > ENTRY(__cpu_setup)
> > 	tlbi	vmalle1				// Invalidate local
> > TLB
> > 	dsb	nsh
> 
> This is called from stext, which then branches to __primary_switch(), which
> calls __enable_mmu() where you see this problem. It shouldn't not be possible
> to
> allocate new tlb entries between these points...
> 
> Do you have CONFIG_RANDOMIZE_BASE disabled? This causes enable_mmu() to be
> called twice, the extra tlb maintenance is in __primary_switch.
> (if it works with this turned off, it points to the extra off/tlbi/on
> sequence).

Yes, CONFIG_RANDOMIZE_BASE is NOT set.

> 
> 
> > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> > index 4471f570a295..5196f3d729de 100644
> > --- a/arch/arm64/kernel/head.S
> > +++ b/arch/arm64/kernel/head.S
> > @@ -771,6 +771,10 @@ ENTRY(__enable_mmu)
> >  	msr	ttbr0_el1, x2			// load TTBR0
> >  	msr	ttbr1_el1, x1			// load TTBR1
> >  	isb
> > +	dsb	nshst
> > +	tlbi	vmalle1				// invalidate
> > TLB
> > +	dsb	nsh
> > +	isb
> >  	msr	sctlr_el1, x0
> >  	isb
> 
> The overall change here is that we do extra maintenance later.
> 
> Can move this around to bisect where the TLB entries are either coming from,
> or
> failing-to-be invalidated?
> Do your first and kdump kernels have the same VA_BITS/PAGE_SIZE?

Yes,

CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PAGE_SHIFT=16
# CONFIG_ARM64_4K_PAGES is not set
# CONFIG_ARM64_16K_PAGES is not set
CONFIG_ARM64_64K_PAGES=y

> As a stab in the dark, (totally untested):
> ------------------------------%<------------------------------
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 2c75b0b903ae..a5f3b7314bda 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -406,9 +406,6 @@ ENDPROC(idmap_kpti_install_ng_mappings)
>   */
>         .pushsection ".idmap.text", "awx"
>  ENTRY(__cpu_setup)
> -       tlbi    vmalle1                         // Invalidate local TLB
> -       dsb     nsh
> -
>         mov     x0, #3 << 20
>         msr     cpacr_el1, x0                   // Enable FP/ASIMD
>         mov     x0, #1 << 12                    // Reset mdscr_el1 and disable
> @@ -465,5 +462,10 @@ ENTRY(__cpu_setup)
>  1:
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>         msr     tcr_el1, x10
> +       isb
> +
> +       tlbi    vmalle1                         // Invalidate local TLB
> +       dsb     nsh
> +
>         ret                                     // return to head.S
>  ENDPROC(__cpu_setup)
> ------------------------------%<------------------------------
> 

This patch works well too.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 4471f570a295..5196f3d729de 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -771,6 +771,10 @@  ENTRY(__enable_mmu)
 	msr	ttbr0_el1, x2			// load TTBR0
 	msr	ttbr1_el1, x1			// load TTBR1
 	isb
+	dsb	nshst
+	tlbi	vmalle1				// invalidate TLB
+	dsb	nsh
+	isb
 	msr	sctlr_el1, x0
 	isb
 	/*