diff mbox series

[1/2] KVM: arm64: nvhe: Synchronise with page table walker on MMU update

Message ID 20230330100419.1436629-2-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Synchronise speculative page table walks on translation regime change | expand

Commit Message

Marc Zyngier March 30, 2023, 10:04 a.m. UTC
When taking an exception between the EL1&0 translation regime and
the EL2 translation regime, the page table walker is allowed to
complete the walks started from EL0 or EL1 while running at EL2.

It means that altering the system registers that define the EL1&0
translation regime is fraught with danger *unless* we wait for
the completion of such walk with a DSB (R_LFHQG and subsequent
statements in the ARM ARM). We already did the right thing for
other external agents (SPE, TRBE), but not the PTW.

In the case of nVHE, this is a bit involved, as there are a number
of situations where this can happen (such as switching between
host and guest, invalidating TLBs...).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/debug-sr.c    |  2 --
 arch/arm64/kvm/hyp/nvhe/mem_protect.c |  7 +++++++
 arch/arm64/kvm/hyp/nvhe/switch.c      | 18 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/tlb.c         |  7 +++++++
 4 files changed, 32 insertions(+), 2 deletions(-)

Comments

Oliver Upton April 6, 2023, 4:56 p.m. UTC | #1
Hey Marc,

On Thu, Mar 30, 2023 at 11:04:18AM +0100, Marc Zyngier wrote:
> When taking an exception between the EL1&0 translation regime and
> the EL2 translation regime, the page table walker is allowed to
> complete the walks started from EL0 or EL1 while running at EL2.
> 
> It means that altering the system registers that define the EL1&0
> translation regime is fraught with danger *unless* we wait for
> the completion of such walk with a DSB (R_LFHQG and subsequent
> statements in the ARM ARM). We already did the right thing for
> other external agents (SPE, TRBE), but not the PTW.
> 
> In the case of nVHE, this is a bit involved, as there are a number
> of situations where this can happen (such as switching between
> host and guest, invalidating TLBs...).

I'm assuming that the dsb(ishst) done in some of the other TLB
invalidation handlers is sufficient, as R_LFHQG does not describe the
scope of the DSB (i.e. loads and/or stores). Nonetheless, short of any
special serialization rules, it seems probable for the PTW to have both
outstanding loads and stores.

Is there some other language in the architecture that speaks to the
effects of _any_ DSB on the PTW? I couldn't find it myself. In any case,
I'll have to take you at your word if you say it is sufficient :)
Marc Zyngier April 7, 2023, 11:26 a.m. UTC | #2
Hi Oliver,

On Thu, 06 Apr 2023 17:56:31 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hey Marc,
> 
> On Thu, Mar 30, 2023 at 11:04:18AM +0100, Marc Zyngier wrote:
> > When taking an exception between the EL1&0 translation regime and
> > the EL2 translation regime, the page table walker is allowed to
> > complete the walks started from EL0 or EL1 while running at EL2.
> > 
> > It means that altering the system registers that define the EL1&0
> > translation regime is fraught with danger *unless* we wait for
> > the completion of such walk with a DSB (R_LFHQG and subsequent
> > statements in the ARM ARM). We already did the right thing for
> > other external agents (SPE, TRBE), but not the PTW.
> > 
> > In the case of nVHE, this is a bit involved, as there are a number
> > of situations where this can happen (such as switching between
> > host and guest, invalidating TLBs...).
> 
> I'm assuming that the dsb(ishst) done in some of the other TLB
> invalidation handlers is sufficient, as R_LFHQG does not describe the
> scope of the DSB (i.e. loads and/or stores). Nonetheless, short of any
> special serialization rules, it seems probable for the PTW to have both
> outstanding loads and stores.

I too find the definition pretty light. My gut feeling is that we're
not really trying to synchronise against either loads or stores. We
are simply waiting for the PTW to complete (or give up) potential
speculative walks.

For TLBIs, we want to make sure that prior writes to the PTs are
observable, specially as we perform a broadcast invalidation.

But for external agents, we seem to always rely on an dsb(nsh), such
as for TRBE and SPE. My take is that if it is enough for them, it
should be enough for the PTW.

> Is there some other language in the architecture that speaks to the
> effects of _any_ DSB on the PTW? I couldn't find it myself. In any case,
> I'll have to take you at your word if you say it is sufficient :)

"Data Synchronization Barrier (DSB)" has a tiny bit more info, but
that's really thin.

Maybe some of the ARM folks on cc can chime in?

Thanks,

	M.
Marc Zyngier April 7, 2023, 11:37 a.m. UTC | #3
On Fri, 07 Apr 2023 12:26:26 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> Hi Oliver,
> 
> On Thu, 06 Apr 2023 17:56:31 +0100,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > Hey Marc,
> > 
> > On Thu, Mar 30, 2023 at 11:04:18AM +0100, Marc Zyngier wrote:
> > > When taking an exception between the EL1&0 translation regime and
> > > the EL2 translation regime, the page table walker is allowed to
> > > complete the walks started from EL0 or EL1 while running at EL2.
> > > 
> > > It means that altering the system registers that define the EL1&0
> > > translation regime is fraught with danger *unless* we wait for
> > > the completion of such walk with a DSB (R_LFHQG and subsequent
> > > statements in the ARM ARM). We already did the right thing for
> > > other external agents (SPE, TRBE), but not the PTW.
> > > 
> > > In the case of nVHE, this is a bit involved, as there are a number
> > > of situations where this can happen (such as switching between
> > > host and guest, invalidating TLBs...).
> > 
> > I'm assuming that the dsb(ishst) done in some of the other TLB
> > invalidation handlers is sufficient, as R_LFHQG does not describe the
> > scope of the DSB (i.e. loads and/or stores). Nonetheless, short of any
> > special serialization rules, it seems probable for the PTW to have both
> > outstanding loads and stores.
> 
> I too find the definition pretty light. My gut feeling is that we're
> not really trying to synchronise against either loads or stores. We
> are simply waiting for the PTW to complete (or give up) potential
> speculative walks.
> 
> For TLBIs, we want to make sure that prior writes to the PTs are
> observable, specially as we perform a broadcast invalidation.
> 
> But for external agents, we seem to always rely on an dsb(nsh), such
> as for TRBE and SPE. My take is that if it is enough for them, it
> should be enough for the PTW.

I'll also add that dsb(nsh) orders both reads and writes either side
of the barrier.

However, I finally get your point about the TLBI code. It only orders
stores either side of the barrier, and I'm starting to wonder whether
we should upgrade it to dsb(ish)...

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 2673bde62fad..d756b939f296 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -37,7 +37,6 @@  static void __debug_save_spe(u64 *pmscr_el1)
 
 	/* Now drain all buffered data to memory */
 	psb_csync();
-	dsb(nsh);
 }
 
 static void __debug_restore_spe(u64 pmscr_el1)
@@ -69,7 +68,6 @@  static void __debug_save_trace(u64 *trfcr_el1)
 	isb();
 	/* Drain the trace buffer to memory */
 	tsb_csync();
-	dsb(nsh);
 }
 
 static void __debug_restore_trace(u64 trfcr_el1)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 552653fa18be..2e9ec4a2a4a3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -297,6 +297,13 @@  int __pkvm_prot_finalize(void)
 	params->vttbr = kvm_get_vttbr(mmu);
 	params->vtcr = host_mmu.arch.vtcr;
 	params->hcr_el2 |= HCR_VM;
+
+	/*
+	 * The CMO below not only cleans the updated params to the
+	 * PoC, but also provides the DSB that ensures ongoing
+	 * page-table walks that have started before we trapped to EL2
+	 * have completed.
+	 */
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 
 	write_sysreg(params->hcr_el2, hcr_el2);
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index c2cb46ca4fb6..71fa16a0dc77 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -272,6 +272,17 @@  int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	__debug_save_host_buffers_nvhe(vcpu);
 
+	/*
+	 * We're about to restore some new MMU state. Make sure
+	 * ongoing page-table walks that have started before we
+	 * trapped to EL2 have completed. This also synchronises the
+	 * above disabling of SPE and TRBE.
+	 *
+	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
+	 * rule R_LFHQG and subsequent information statements.
+	 */
+	dsb(nsh);
+
 	__kvm_adjust_pc(vcpu);
 
 	/*
@@ -306,6 +317,13 @@  int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__timer_disable_traps(vcpu);
 	__hyp_vgic_save_state(vcpu);
 
+	/*
+	 * Same thing as before the guest run: we're about to switch
+	 * the MMU context, so let's make sure we don't have any
+	 * ongoing EL1&0 translations.
+	 */
+	dsb(nsh);
+
 	__deactivate_traps(vcpu);
 	__load_host_stage2();
 
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index d296d617f589..15c3e782dbd8 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -129,6 +129,13 @@  void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
 
+	/*
+	 * We're about to restore some guest MMU state. Make sure
+	 * ongoing page-table walks that have started before we
+	 * trapped to EL2 have completed.
+	 */
+	dsb(nsh);
+
 	/* Switch to requested VMID */
 	__tlb_switch_to_guest(mmu, &cxt);