diff mbox series

[v3,09/15] KVM: arm64: Free removed stage-2 tables in RCU callback

Message ID 20221027221752.1683510-10-oliver.upton@linux.dev (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Parallel stage-2 fault handling | expand

Commit Message

Oliver Upton Oct. 27, 2022, 10:17 p.m. UTC
There is no real urgency to free a stage-2 subtree that was pruned.
Nonetheless, KVM does the tear down in the stage-2 fault path while
holding the MMU lock.

Free removed stage-2 subtrees after an RCU grace period. To guarantee
all stage-2 table pages are freed before killing a VM, add an
rcu_barrier() to the flush path.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/include/asm/kvm_pgtable.h |  2 +-
 arch/arm64/kvm/mmu.c                 | 14 +++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

Comments

Sean Christopherson Nov. 1, 2022, 8:28 p.m. UTC | #1
On Thu, Oct 27, 2022, Oliver Upton wrote:
> There is no real urgency to free a stage-2 subtree that was pruned.
> Nonetheless, KVM does the tear down in the stage-2 fault path while
> holding the MMU lock.
> 
> Free removed stage-2 subtrees after an RCU grace period. To guarantee
> all stage-2 table pages are freed before killing a VM, add an
> rcu_barrier() to the flush path.

This is _very_ misleading.  The above paints RCU as an optimization of sorts to
avoid doing work while holding mmu_lock.  Freeing page tables in an RCU callback
is _required_ for correctness when allowing parallel page faults to remove page
tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is
accessing and/or holds a reference to the to-be-freed page table.

IMO, this patch should to be squashed with the previous patch, "Protect stage-2
traversal with RCU".  One doesn't make any sense without the other.
Oliver Upton Nov. 1, 2022, 8:46 p.m. UTC | #2
On Tue, Nov 01, 2022 at 08:28:04PM +0000, Sean Christopherson wrote:
> On Thu, Oct 27, 2022, Oliver Upton wrote:
> > There is no real urgency to free a stage-2 subtree that was pruned.
> > Nonetheless, KVM does the tear down in the stage-2 fault path while
> > holding the MMU lock.
> > 

[ copy ]

> This is _very_ misleading.  The above paints RCU as an optimization of sorts to
> avoid doing work while holding mmu_lock.  Freeing page tables in an RCU callback
> is _required_ for correctness when allowing parallel page faults to remove page
> tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is
> accessing and/or holds a reference to the to-be-freed page table.

Agree, but it is still important to reason about what is changing here
too. Moving work out of the vCPU fault path _is_ valuable, though
ancillary to the correctness requirements.

> IMO, this patch should to be squashed with the previous patch, "Protect stage-2
> traversal with RCU".  One doesn't make any sense without the other.

I had split these up back when this series was a lot more gnarly and
there was too much slop in a single diff. That isn't the case any more,
so yeah I'll squash them.

[ paste ]

> > Free removed stage-2 subtrees after an RCU grace period. To guarantee
> > all stage-2 table pages are freed before killing a VM, add an
> > rcu_barrier() to the flush path.

An aside, this is flat-out wrong now.

--
Thanks,
Oliver
Sean Christopherson Nov. 1, 2022, 9:07 p.m. UTC | #3
On Tue, Nov 01, 2022, Oliver Upton wrote:
> On Tue, Nov 01, 2022 at 08:28:04PM +0000, Sean Christopherson wrote:
> > On Thu, Oct 27, 2022, Oliver Upton wrote:
> > > There is no real urgency to free a stage-2 subtree that was pruned.
> > > Nonetheless, KVM does the tear down in the stage-2 fault path while
> > > holding the MMU lock.
> > > 
> 
> [ copy ]
> 
> > This is _very_ misleading.  The above paints RCU as an optimization of sorts to
> > avoid doing work while holding mmu_lock.  Freeing page tables in an RCU callback
> > is _required_ for correctness when allowing parallel page faults to remove page
> > tables, as holding mmu_lock for read in that case doesn't ensure no other CPU is
> > accessing and/or holds a reference to the to-be-freed page table.
> 
> Agree, but it is still important to reason about what is changing here
> too. Moving work out of the vCPU fault path _is_ valuable, though
> ancillary to the correctness requirements.

Sure, but that's at best a footnote.  Similar to protecting freeing, RCU isn't
the only option for moving work out of the vCPU fault path.  In fact, it's probably
one of the worst options because RCU callbacks run with soft IRQs disabled, i.e.
doing _too_ much in a RCU callback is a real problem.  If RCU weren't being used
to protect readers, deferring freeing via a workqueue, kthread, etc... would work
just as well, if not better.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d1859e8550df..f35e7d4d73f1 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -65,7 +65,7 @@  typedef kvm_pte_t __rcu *kvm_pteref_t;
 
 static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared)
 {
-	return rcu_dereference_check(pteref, shared);
+	return rcu_dereference_check(pteref, !shared);
 }
 
 static inline void kvm_pgtable_walk_begin(void)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 175a0e9a04b6..a1a623c5b069 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -130,9 +130,21 @@  static void kvm_s2_free_pages_exact(void *virt, size_t size)
 
 static struct kvm_pgtable_mm_ops kvm_s2_mm_ops;
 
+static void stage2_free_removed_table_rcu_cb(struct rcu_head *head)
+{
+	struct page *page = container_of(head, struct page, rcu_head);
+	void *pgtable = page_to_virt(page);
+	u32 level = page_private(page);
+
+	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, pgtable, level);
+}
+
 static void stage2_free_removed_table(void *addr, u32 level)
 {
-	kvm_pgtable_stage2_free_removed(&kvm_s2_mm_ops, addr, level);
+	struct page *page = virt_to_page(addr);
+
+	set_page_private(page, (unsigned long)level);
+	call_rcu(&page->rcu_head, stage2_free_removed_table_rcu_cb);
 }
 
 static void kvm_host_get_page(void *addr)