diff mbox series

arm64/relocate_kernel: remove redundant but misleading code

Message ID 1596702378-29550-1-git-send-email-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series arm64/relocate_kernel: remove redundant but misleading code | expand

Commit Message

Pingfan Liu Aug. 6, 2020, 8:26 a.m. UTC
The kexec switch sequence looks like the following:
  SYM_CODE_START(__cpu_soft_restart)
    ...
    pre_disable_mmu_workaround
    msr	sctlr_el1, x12
    ...
    br	x8

  SYM_CODE_START(arm64_relocate_new_kernel)
    ...
    pre_disable_mmu_workaround
    msr	sctlr_el2, x0
    ...

"msr sctlr_el2, x0" is misleading, because "br x8" jump to a physical
address, which has no entry in idmap. It implies that MMU has already been
fully off after "msr sctlr_el1, x12".

And according to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
"ARM Architecture Reference Manual", actually, EL2&0 host accesses
to SCTLR_EL2 when using mnemonic SCTLR_EL1.

Hence removing the redundant but misleading code.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/kernel/cpu-reset.S       |  4 ++++
 arch/arm64/kernel/relocate_kernel.S | 12 ------------
 2 files changed, 4 insertions(+), 12 deletions(-)

Comments

James Morse Aug. 6, 2020, 12:20 p.m. UTC | #1
Hi Liu,

On 06/08/2020 09:26, Pingfan Liu wrote:
> The kexec switch sequence looks like the following:
>   SYM_CODE_START(__cpu_soft_restart)
>     ...
>     pre_disable_mmu_workaround
>     msr	sctlr_el1, x12
>     ...
>     br	x8
> 
>   SYM_CODE_START(arm64_relocate_new_kernel)
>     ...
>     pre_disable_mmu_workaround
>     msr	sctlr_el2, x0
>     ...

> "msr sctlr_el2, x0" is misleading, because "br x8" jump to a physical
> address, which has no entry in idmap.

Even better: this code run from a copy allocated by kexec, its not in the idmap either.

See the memcpy() in machine_kexec().


> It implies that MMU has already been fully off after "msr sctlr_el1, x12".


> And according to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
> "ARM Architecture Reference Manual", actually, EL2&0 host accesses
> to SCTLR_EL2 when using mnemonic SCTLR_EL1.

Only when HCR_EL2.E2H is enabled. If linux booted at EL2 on a non-VHE system, SCTLR_EL1
and SCTLR_EL2 are different registers, both of which are managed by linux/KVM.


> Hence removing the redundant but misleading code.

This isn't the reason its redundant...


> diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> index 4a18055..37721eb 100644
> --- a/arch/arm64/kernel/cpu-reset.S
> +++ b/arch/arm64/kernel/cpu-reset.S
> @@ -35,6 +35,10 @@ SYM_CODE_START(__cpu_soft_restart)
>  	mov_q	x13, SCTLR_ELx_FLAGS
>  	bic	x12, x12, x13
>  	pre_disable_mmu_workaround
> +	/*
> +	 * either disable EL1&0 translation regime or disable EL2&0 translation
> +	 * regime if HCR_EL2.E2H == 1
> +	 */>  	msr	sctlr_el1, x12
>  	isb

On a VHE system, yes the cpu-reset.S disables EL2&0 by writing to SCTLR_EL1

But on a non-VHE system, that same code disabled EL1&0. cup-reset.S goes on to call
HVC_SOFT_RESTART for EL2, which may be serviced by KVM or the hyp-stub. (or maybe
something else that implements the hyp-stub api)

For kexec, on non-VHE both EL1&0 and EL2 get disabled.


> diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
> index 542d6ed..84eec95 100644
> --- a/arch/arm64/kernel/relocate_kernel.S
> +++ b/arch/arm64/kernel/relocate_kernel.S
> @@ -36,18 +36,6 @@ SYM_CODE_START(arm64_relocate_new_kernel)
>  	mov	x14, xzr			/* x14 = entry ptr */
>  	mov	x13, xzr			/* x13 = copy dest */
>  
> -	/* Clear the sctlr_el2 flags. */
> -	mrs	x0, CurrentEL
> -	cmp	x0, #CurrentEL_EL2
> -	b.ne	1f
> -	mrs	x0, sctlr_el2
> -	mov_q	x1, SCTLR_ELx_FLAGS
> -	bic	x0, x0, x1
> -	pre_disable_mmu_workaround
> -	msr	sctlr_el2, x0
> -	isb
> -1:

I agree this doesn't disable the MMU anymore.

This was originally kept to disable the I+C bits when Kdump interrupted KVM, but since KVM
formalised the hyp-stub API, and has this exact sequence to back its HVC_SOFT_RESTART, it
was only needed for the hyp-stub itself, which has no clue about these SCTLR_EL2 bits.

HVC_SOFT_RESTART only says it needs to disable the MMU. See
Documentation/virt/kvm/arm/hyp-abi.rst


I think its fine to remove this, but the reason is because el2_setup doesn't set those
bits, and KVM clears them when its unloaded, or has a HVC_SOFT_RESTART call.

It might be worth updating the document, but we'd need to check the guarantee is the same
on 32bit. I assume there is no out-of-tree user of the hyp-stub abi.


I don't think the E2H register redirection has anything to do with this.



Thanks,

James
Pingfan Liu Aug. 7, 2020, 2:14 p.m. UTC | #2
Hi Morse,

Appreciate for your patient explanation. I have no experience of
arm/kvm and after a quick through, I still have some questions. Please
correct me if I am wrong.

On Thu, Aug 6, 2020 at 8:20 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Liu,
>
> On 06/08/2020 09:26, Pingfan Liu wrote:
> > The kexec switch sequence looks like the following:
> >   SYM_CODE_START(__cpu_soft_restart)
> >     ...
> >     pre_disable_mmu_workaround
> >     msr       sctlr_el1, x12
> >     ...
> >     br        x8
> >
> >   SYM_CODE_START(arm64_relocate_new_kernel)
> >     ...
> >     pre_disable_mmu_workaround
> >     msr       sctlr_el2, x0
> >     ...
>
> > "msr sctlr_el2, x0" is misleading, because "br x8" jump to a physical
> > address, which has no entry in idmap.
>
> Even better: this code run from a copy allocated by kexec, its not in the idmap either.
Yes, looks better.
>
> See the memcpy() in machine_kexec().
>
>
> > It implies that MMU has already been fully off after "msr sctlr_el1, x12".
>
>
> > And according to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
> > "ARM Architecture Reference Manual", actually, EL2&0 host accesses
> > to SCTLR_EL2 when using mnemonic SCTLR_EL1.
>
> Only when HCR_EL2.E2H is enabled. If linux booted at EL2 on a non-VHE system, SCTLR_EL1
> and SCTLR_EL2 are different registers, both of which are managed by linux/KVM.
Yeah. I have realized these factors when composing this patch, but not
sure anything is missing.

Kexec on arm is introduced by the following two commits
d28f6df arm64/kexec: Add core kexec support
f9076ec arm64: Add back cpu reset routines

And at d28f6df, the code snippet is
in kernel/cpu-reset.S,
ENTRY(__cpu_soft_restart)
        /* Clear sctlr_el1 flags. */
        mrs     x12, sctlr_el1
        ldr     x13, =SCTLR_ELx_FLAGS
        bic     x12, x12, x13
        msr     sctlr_el1, x12
        isb

        cbz     x0, 1f                          // el2_switch?
        mov     x0, #HVC_SOFT_RESTART
        hvc     #0                              // no return
        //--> as the note, I think for both EL1&0 host and guest, they
will never resume the following code
        //--> so in the original patch, the rest code is only for EL2&0 host

1:      mov     x18, x1                         // entry
        mov     x0, x2                          // arg0
        mov     x1, x3                          // arg1
        mov     x2, x4                          // arg2
        br      x18
ENDPROC(__cpu_soft_restart)

And in kernel/hyp-stub.S
el1_sync:
        mrs     x30, esr_el2
        lsr     x30, x30, #ESR_ELx_EC_SHIFT

        cmp     x30, #ESR_ELx_EC_HVC64
        b.ne    9f                              // Not an HVC trap

        cmp     x0, #HVC_GET_VECTORS
        b.ne    1f
        mrs     x0, vbar_el2
        b       9f

1:      cmp     x0, #HVC_SET_VECTORS
        b.ne    2f
        msr     vbar_el2, x1
        b       9f

2:      cmp     x0, #HVC_SOFT_RESTART
        b.ne    3f
        mov     x0, x2
        mov     x2, x4
        mov     x4, x1
        mov     x1, x3
        br      x4                              // no return

        /* Someone called kvm_call_hyp() against the hyp-stub... */
3:      mov     x0, #ARM_EXCEPTION_HYP_GONE

9:      eret
ENDPROC(el1_sync)

I think for a soft restart, it jumps to the new kernel by 'br x4'. But
it seems a bug, which does not clear I+C bits, not disable EL2
translation regime.

On the other hand, if it wants to resume execution immediately after
"hvc #0" in ENTRY(__cpu_soft_restart), it should eret by the
modification of "spsr_el2.M[3:0] = PSR_MODE_EL2h", the code originates
from EL1, but the rest code tries to modify EL2 registers.

By this way, the code can do the exact things you mentioned. But there
seems to be a missing of such mechanisms.
>
>
> > Hence removing the redundant but misleading code.
>
> This isn't the reason its redundant...
>
>
> > diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> > index 4a18055..37721eb 100644
> > --- a/arch/arm64/kernel/cpu-reset.S
> > +++ b/arch/arm64/kernel/cpu-reset.S
> > @@ -35,6 +35,10 @@ SYM_CODE_START(__cpu_soft_restart)
> >       mov_q   x13, SCTLR_ELx_FLAGS
> >       bic     x12, x12, x13
> >       pre_disable_mmu_workaround
> > +     /*
> > +      * either disable EL1&0 translation regime or disable EL2&0 translation
> > +      * regime if HCR_EL2.E2H == 1
> > +      */>    msr     sctlr_el1, x12
> >       isb
>
> On a VHE system, yes the cpu-reset.S disables EL2&0 by writing to SCTLR_EL1
>
> But on a non-VHE system, that same code disabled EL1&0. cup-reset.S goes on to call
> HVC_SOFT_RESTART for EL2, which may be serviced by KVM or the hyp-stub. (or maybe
> something else that implements the hyp-stub api)
>
> For kexec, on non-VHE both EL1&0 and EL2 get disabled.
Yes. I see the latest code in SYM_CODE_START(__kvm_handle_stub_hvc)
does the things exactly as what you said.
>
>
> > diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
> > index 542d6ed..84eec95 100644
> > --- a/arch/arm64/kernel/relocate_kernel.S
> > +++ b/arch/arm64/kernel/relocate_kernel.S
> > @@ -36,18 +36,6 @@ SYM_CODE_START(arm64_relocate_new_kernel)
> >       mov     x14, xzr                        /* x14 = entry ptr */
> >       mov     x13, xzr                        /* x13 = copy dest */
> >
> > -     /* Clear the sctlr_el2 flags. */
> > -     mrs     x0, CurrentEL
> > -     cmp     x0, #CurrentEL_EL2
> > -     b.ne    1f
> > -     mrs     x0, sctlr_el2
> > -     mov_q   x1, SCTLR_ELx_FLAGS
> > -     bic     x0, x0, x1
> > -     pre_disable_mmu_workaround
> > -     msr     sctlr_el2, x0
> > -     isb
> > -1:
>
> I agree this doesn't disable the MMU anymore.
>
> This was originally kept to disable the I+C bits when Kdump interrupted KVM, but since KVM
> formalised the hyp-stub API, and has this exact sequence to back its HVC_SOFT_RESTART, it
> was only needed for the hyp-stub itself, which has no clue about these SCTLR_EL2 bits.
>
> HVC_SOFT_RESTART only says it needs to disable the MMU. See
> Documentation/virt/kvm/arm/hyp-abi.rst
>
>
> I think its fine to remove this, but the reason is because el2_setup doesn't set those
> bits, and KVM clears them when its unloaded, or has a HVC_SOFT_RESTART call.
>
> It might be worth updating the document, but we'd need to check the guarantee is the same
> on 32bit. I assume there is no out-of-tree user of the hyp-stub abi.
I have no idea about 32bit. Could you enlighten me?

Again, I am a fresh man on arm/kvm. Please forgive me if I make any
silly mistakes.

Thanks for your time.

Pingfan
Pingfan Liu Aug. 12, 2020, 2:21 p.m. UTC | #3
Hi Morse,

I have read the arm64/kvm code, and been more clear about it now.

What do you think about the following commit log (just describe the fact)

    arm64/relocate_kernel: remove redundant code

    The kexec switch sequence looks like the following:
      SYM_CODE_START(__cpu_soft_restart)
              /* Clear sctlr_el1 flags. */
              mrs     x12, sctlr_el1
              mov_q   x13, SCTLR_ELx_FLAGS
              bic     x12, x12, x13
              pre_disable_mmu_workaround
              msr     sctlr_el1, x12
              isb

              cbz     x0, 1f                          // el2_switch?
              mov     x0, #HVC_SOFT_RESTART
              hvc     #0                              // no return

      1:      mov     x8, x1                          // entry
              mov     x0, x2                          // arg0
              mov     x1, x3                          // arg1
              mov     x2, x4                          // arg2
              br      x8
      SYM_CODE_END(__cpu_soft_restart)

      SYM_CODE_START(arm64_relocate_new_kernel)
        ...
        pre_disable_mmu_workaround
        msr     sctlr_el2, x0
        ...

    As for the shutdown of MMU and clearing of I+C bits, three cases should be
    considered:
    -1. Guest
    "msr sctlr_el1, x12" is enough to turn off EL1&0 translation regime and
    clear I+C bits.

    -2. EL2&0 host
    According to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
    "ARM Architecture Reference Manual", actually, EL2&0 host accesses
    to SCTLR_EL2 when using mnemonic SCTLR_EL1.
    So "msr sctlr_el1, x12" is enough to turn off MMU and clear I+C bits.

    -3. EL1&0 host,
    "msr sctlr_el1, x12" turns off EL1&0 translation regime. As for EL2 regime,
    el2_setup doesn't turn on EL2 regime and set those bits , and KVM clears
    them when it's unloaded, or has a HVC_SOFT_RESTART call.

    As a conclusion, the shutdown of MMU and clearing I+C bits in
    SYM_CODE_START(arm64_relocate_new_kernel) is redundant.

Thanks,
Pingfan

On Thu, Aug 6, 2020 at 8:20 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Liu,
>
> On 06/08/2020 09:26, Pingfan Liu wrote:
> > The kexec switch sequence looks like the following:
> >   SYM_CODE_START(__cpu_soft_restart)
> >     ...
> >     pre_disable_mmu_workaround
> >     msr       sctlr_el1, x12
> >     ...
> >     br        x8
> >
> >   SYM_CODE_START(arm64_relocate_new_kernel)
> >     ...
> >     pre_disable_mmu_workaround
> >     msr       sctlr_el2, x0
> >     ...
>
> > "msr sctlr_el2, x0" is misleading, because "br x8" jump to a physical
> > address, which has no entry in idmap.
>
> Even better: this code run from a copy allocated by kexec, its not in the idmap either.
>
> See the memcpy() in machine_kexec().
>
>
> > It implies that MMU has already been fully off after "msr sctlr_el1, x12".
>
>
> > And according to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
> > "ARM Architecture Reference Manual", actually, EL2&0 host accesses
> > to SCTLR_EL2 when using mnemonic SCTLR_EL1.
>
> Only when HCR_EL2.E2H is enabled. If linux booted at EL2 on a non-VHE system, SCTLR_EL1
> and SCTLR_EL2 are different registers, both of which are managed by linux/KVM.
>
>
> > Hence removing the redundant but misleading code.
>
> This isn't the reason its redundant...
>
>
> > diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
> > index 4a18055..37721eb 100644
> > --- a/arch/arm64/kernel/cpu-reset.S
> > +++ b/arch/arm64/kernel/cpu-reset.S
> > @@ -35,6 +35,10 @@ SYM_CODE_START(__cpu_soft_restart)
> >       mov_q   x13, SCTLR_ELx_FLAGS
> >       bic     x12, x12, x13
> >       pre_disable_mmu_workaround
> > +     /*
> > +      * either disable EL1&0 translation regime or disable EL2&0 translation
> > +      * regime if HCR_EL2.E2H == 1
> > +      */>    msr     sctlr_el1, x12
> >       isb
>
> On a VHE system, yes the cpu-reset.S disables EL2&0 by writing to SCTLR_EL1
>
> But on a non-VHE system, that same code disabled EL1&0. cup-reset.S goes on to call
> HVC_SOFT_RESTART for EL2, which may be serviced by KVM or the hyp-stub. (or maybe
> something else that implements the hyp-stub api)
>
> For kexec, on non-VHE both EL1&0 and EL2 get disabled.
>
>
> > diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
> > index 542d6ed..84eec95 100644
> > --- a/arch/arm64/kernel/relocate_kernel.S
> > +++ b/arch/arm64/kernel/relocate_kernel.S
> > @@ -36,18 +36,6 @@ SYM_CODE_START(arm64_relocate_new_kernel)
> >       mov     x14, xzr                        /* x14 = entry ptr */
> >       mov     x13, xzr                        /* x13 = copy dest */
> >
> > -     /* Clear the sctlr_el2 flags. */
> > -     mrs     x0, CurrentEL
> > -     cmp     x0, #CurrentEL_EL2
> > -     b.ne    1f
> > -     mrs     x0, sctlr_el2
> > -     mov_q   x1, SCTLR_ELx_FLAGS
> > -     bic     x0, x0, x1
> > -     pre_disable_mmu_workaround
> > -     msr     sctlr_el2, x0
> > -     isb
> > -1:
>
> I agree this doesn't disable the MMU anymore.
>
> This was originally kept to disable the I+C bits when Kdump interrupted KVM, but since KVM
> formalised the hyp-stub API, and has this exact sequence to back its HVC_SOFT_RESTART, it
> was only needed for the hyp-stub itself, which has no clue about these SCTLR_EL2 bits.
>
> HVC_SOFT_RESTART only says it needs to disable the MMU. See
> Documentation/virt/kvm/arm/hyp-abi.rst
>
>
> I think its fine to remove this, but the reason is because el2_setup doesn't set those
> bits, and KVM clears them when its unloaded, or has a HVC_SOFT_RESTART call.
>
> It might be worth updating the document, but we'd need to check the guarantee is the same
> on 32bit. I assume there is no out-of-tree user of the hyp-stub abi.
>
>
> I don't think the E2H register redirection has anything to do with this.
>
>
>
> Thanks,
>
> James
James Morse Aug. 25, 2020, 5:56 p.m. UTC | #4
Hi Pingfan,

On 12/08/2020 15:21, Pingfan Liu wrote:
> I have read the arm64/kvm code, and been more clear about it now.
> 
> What do you think about the following commit log (just describe the fact)
> 
>     arm64/relocate_kernel: remove redundant code
> 
>     The kexec switch sequence looks like the following:
>       SYM_CODE_START(__cpu_soft_restart)
>               /* Clear sctlr_el1 flags. */
>               mrs     x12, sctlr_el1
>               mov_q   x13, SCTLR_ELx_FLAGS
>               bic     x12, x12, x13
>               pre_disable_mmu_workaround
>               msr     sctlr_el1, x12
>               isb
> 
>               cbz     x0, 1f                          // el2_switch?
>               mov     x0, #HVC_SOFT_RESTART
>               hvc     #0                              // no return
> 
>       1:      mov     x8, x1                          // entry
>               mov     x0, x2                          // arg0
>               mov     x1, x3                          // arg1
>               mov     x2, x4                          // arg2
>               br      x8
>       SYM_CODE_END(__cpu_soft_restart)

I think this is far to much to put in the commit message!

It should be a summary. We can always checkout the whole tree at the point the commit was
made to look at functions like this, we don't need to preserve them in the commit log.



>       SYM_CODE_START(arm64_relocate_new_kernel)
>         ...
>         pre_disable_mmu_workaround
>         msr     sctlr_el2, x0
>         ...
> 
>     As for the shutdown of MMU and clearing of I+C bits, three cases should be
>     considered:

>     -1. Guest

I don't think host/guest matter here. This one is really 'booted at EL1'.



>     "msr sctlr_el1, x12" is enough to turn off EL1&0 translation regime and
>     clear I+C bits.

>     -2. EL2&0 host

(Booted at EL2, using VHE)


>     According to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
>     "ARM Architecture Reference Manual", actually, EL2&0 host accesses
>     to SCTLR_EL2 when using mnemonic SCTLR_EL1.
>     So "msr sctlr_el1, x12" is enough to turn off MMU and clear I+C bits.


>     -3. EL1&0 host,

(Booted at EL2, not using VHE)


>     "msr sctlr_el1, x12" turns off EL1&0 translation regime.

>      As for EL2 regime, el2_setup doesn't turn on EL2 regime
A regime isn't something you turn on, I think this should just be 'The hyp-stub doesn't
enable the MMU, see el2_setup.'

digression {
If you think about the EL1&0 regime when stage2 is enabled, is the regime only 'on' when
the stage1 MMU is enabled? Not really, when the stage1 MMU is disabled it outputs VA==IPA,
and the stage2 MMU translates this to PA. The whole thing belongs to the translation EL1&0
regime.
}


>     and set those bits , and KVM clears
>     them when it's unloaded, or has a HVC_SOFT_RESTART call.
> 
>     As a conclusion, the shutdown of MMU and clearing I+C bits in
>     SYM_CODE_START(arm64_relocate_new_kernel) is redundant.


This certainly covers all the cases.

As kexec is now depending on this behaviour of KVM, how do you feel about updating the
document at Documentation/virt/kvm/arm/hyp-abi.rst ?

I think all it needs is a "Clear the I+C bits (arm64 only)" under HVC_SOFT_RESTART.


Thanks,

James
Pingfan Liu Aug. 28, 2020, 8:38 a.m. UTC | #5
On Wed, Aug 26, 2020 at 1:56 AM James Morse <james.morse@arm.com> wrote:
>
> Hi Pingfan,
>
> On 12/08/2020 15:21, Pingfan Liu wrote:
> > I have read the arm64/kvm code, and been more clear about it now.
> >
> > What do you think about the following commit log (just describe the fact)
> >
> >     arm64/relocate_kernel: remove redundant code
> >
> >     The kexec switch sequence looks like the following:
> >       SYM_CODE_START(__cpu_soft_restart)
> >               /* Clear sctlr_el1 flags. */
> >               mrs     x12, sctlr_el1
> >               mov_q   x13, SCTLR_ELx_FLAGS
> >               bic     x12, x12, x13
> >               pre_disable_mmu_workaround
> >               msr     sctlr_el1, x12
> >               isb
> >
> >               cbz     x0, 1f                          // el2_switch?
> >               mov     x0, #HVC_SOFT_RESTART
> >               hvc     #0                              // no return
> >
> >       1:      mov     x8, x1                          // entry
> >               mov     x0, x2                          // arg0
> >               mov     x1, x3                          // arg1
> >               mov     x2, x4                          // arg2
> >               br      x8
> >       SYM_CODE_END(__cpu_soft_restart)
>
> I think this is far to much to put in the commit message!
>
> It should be a summary. We can always checkout the whole tree at the point the commit was
> made to look at functions like this, we don't need to preserve them in the commit log.
Yes, I will clean them.
>
>
>
> >       SYM_CODE_START(arm64_relocate_new_kernel)
> >         ...
> >         pre_disable_mmu_workaround
> >         msr     sctlr_el2, x0
> >         ...
> >
> >     As for the shutdown of MMU and clearing of I+C bits, three cases should be
> >     considered:
>
> >     -1. Guest
>
> I don't think host/guest matter here. This one is really 'booted at EL1'.
Yes, it is a more accurate view to consider and describe this issue.

>
>
>
> >     "msr sctlr_el1, x12" is enough to turn off EL1&0 translation regime and
> >     clear I+C bits.
>
> >     -2. EL2&0 host
>
> (Booted at EL2, using VHE)
Yes, I will use this term.

>
>
> >     According to "D12.2.101 SCTLR_EL2, System Control Register (EL2)" in
> >     "ARM Architecture Reference Manual", actually, EL2&0 host accesses
> >     to SCTLR_EL2 when using mnemonic SCTLR_EL1.
> >     So "msr sctlr_el1, x12" is enough to turn off MMU and clear I+C bits.
>
>
> >     -3. EL1&0 host,
>
> (Booted at EL2, not using VHE)
Ditto.
>
>
> >     "msr sctlr_el1, x12" turns off EL1&0 translation regime.
>
> >      As for EL2 regime, el2_setup doesn't turn on EL2 regime
> A regime isn't something you turn on, I think this should just be 'The hyp-stub doesn't
> enable the MMU, see el2_setup.'
OK, I will use this description.

>
> digression {
> If you think about the EL1&0 regime when stage2 is enabled, is the regime only 'on' when
> the stage1 MMU is enabled? Not really, when the stage1 MMU is disabled it outputs VA==IPA,
> and the stage2 MMU translates this to PA. The whole thing belongs to the translation EL1&0
> regime.
> }
Yes, I totally agree with you. And I revisit the manual so I can
describe this area more precisely in future.

>
>
> >     and set those bits , and KVM clears
> >     them when it's unloaded, or has a HVC_SOFT_RESTART call.
> >
> >     As a conclusion, the shutdown of MMU and clearing I+C bits in
> >     SYM_CODE_START(arm64_relocate_new_kernel) is redundant.
>
>
> This certainly covers all the cases.
>
> As kexec is now depending on this behaviour of KVM, how do you feel about updating the
> document at Documentation/virt/kvm/arm/hyp-abi.rst ?
Sure. I have sent a separate patch for the document issue. I will fold
the two patches into a series.

>
> I think all it needs is a "Clear the I+C bits (arm64 only)" under HVC_SOFT_RESTART.
OK

Thanks,
Pingfan
diff mbox series

Patch

diff --git a/arch/arm64/kernel/cpu-reset.S b/arch/arm64/kernel/cpu-reset.S
index 4a18055..37721eb 100644
--- a/arch/arm64/kernel/cpu-reset.S
+++ b/arch/arm64/kernel/cpu-reset.S
@@ -35,6 +35,10 @@  SYM_CODE_START(__cpu_soft_restart)
 	mov_q	x13, SCTLR_ELx_FLAGS
 	bic	x12, x12, x13
 	pre_disable_mmu_workaround
+	/*
+	 * either disable EL1&0 translation regime or disable EL2&0 translation
+	 * regime if HCR_EL2.E2H == 1
+	 */
 	msr	sctlr_el1, x12
 	isb
 
diff --git a/arch/arm64/kernel/relocate_kernel.S b/arch/arm64/kernel/relocate_kernel.S
index 542d6ed..84eec95 100644
--- a/arch/arm64/kernel/relocate_kernel.S
+++ b/arch/arm64/kernel/relocate_kernel.S
@@ -36,18 +36,6 @@  SYM_CODE_START(arm64_relocate_new_kernel)
 	mov	x14, xzr			/* x14 = entry ptr */
 	mov	x13, xzr			/* x13 = copy dest */
 
-	/* Clear the sctlr_el2 flags. */
-	mrs	x0, CurrentEL
-	cmp	x0, #CurrentEL_EL2
-	b.ne	1f
-	mrs	x0, sctlr_el2
-	mov_q	x1, SCTLR_ELx_FLAGS
-	bic	x0, x0, x1
-	pre_disable_mmu_workaround
-	msr	sctlr_el2, x0
-	isb
-1:
-
 	/* Check if the new image needs relocation. */
 	tbnz	x16, IND_DONE_BIT, .Ldone