mbox series

[0/8] KVM: x86/mmu: Allow TDP MMU (un)load to run in parallel

Message ID 20240111020048.844847-1-seanjc@google.com (mailing list archive)
Headers show
Series KVM: x86/mmu: Allow TDP MMU (un)load to run in parallel | expand

Message

Sean Christopherson Jan. 11, 2024, 2 a.m. UTC
This series is the result of digging into why deleting a memslot, which on
x86 forces all vCPUs to reload a new MMU root, causes noticeably more jitter
in vCPUs and other tasks when running with the TDP MMU than the Shadow MMU
(with TDP enabled).

Patch 1 addresses the most obvious issue by simply zapping at a finer
granularity so that if a different task, e.g. a vCPU, wants to run on the
pCPU doing the zapping, it doesn't have to wait for KVM to zap an entire
1GiB region, which can take a hundreds of microseconds (or more).  The
shadow MMU checks for need_resched() (and mmu_lock contention, see below)
every 10 zaps, which is why the shadow MMU doesn't induce the same level
of jitter.

On preemptible kernels, zapping at 4KiB granularity will also cause the
zapping task to yield mmu_lock much more aggressively if a writer comes
along.  That _sounds_ like a good thing, and most of the time it is, but
sometimes bouncing mmu_lock can be a big net negative:
https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com

While trying to figure out whether or not frequently yielding mmu_lock
would be a negative or positive, I ran into extremely high latencies for
loading TDP MMU roots on VMs with large-ish numbers of vCPUs, e.g. a vCPU
could end up taking more than a second to 

Long story short, the issue is that the TDP MMU acquires mmu_lock for
write when unloading roots, and again when loading a "new" root (in quotes
because most vCPUs end up loading an existing root).  With a decent number
of vCPUs, that results in a _lot_ mmu_lock contention, as every vCPU will
take and release mmu_lock for write to unload its roots, and then again to
load a new root.  Due to rwlock's fairness (waiting writers block new
readers), the contention can result in rather nasty worst case scenarios.

Patches 6-8 fix the issues by taking mmu_lock for read.  The free path is
very straightforward and doesn't require any new protection (IIRC, the only
reason we didn't pursue this when reworking the TDP MMU zapping back at the
end of 2021 was because we had bigger issues to solve).  Allocating a new
root with mmu_lock held for read is a little harder, but still fairly easy.
KVM only needs to ensure that it doesn't create duplicate roots, because
everything that needs mmu_lock to ensure ordering must take mmu_lock for
write, i.e. is still mutually exclusive with new roots coming along.

Patches 2-5 are small cleanups to avoid doing work for invalid roots, e.g.
when zapping SPTEs purely to affect guest behavior, there's no need to zap
invalid roots because they are unreachable from the guest.

All told, this significantly reduces mmu_lock contention when doing a fast
zap, i.e. when deleting memslots, and takes the worst case latency for a
vCPU to load a new root from >3ms to <100us for large-ish VMs (100+ vCPUs)
For small and medium sized VMs (<24 vCPUs), the vast majority of loads
takes less than 1us, with the worst case being <10us, versus >200us without
this series.

Note, I did all of the latency testing before the holidays, and then
managed to lose almost all of my notes, which is why I don't have more
precise data on the exact setups and latency bins.  /facepalm

Sean Christopherson (8):
  KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
  KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots
  KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators
  KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range
  KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs
  KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for
    read
  KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read
  KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read

 arch/x86/kvm/mmu/mmu.c     |  33 +++++++---
 arch/x86/kvm/mmu/tdp_mmu.c | 124 ++++++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/tdp_mmu.h |   2 +-
 3 files changed, 111 insertions(+), 48 deletions(-)


base-commit: 1c6d984f523f67ecfad1083bb04c55d91977bb15

Comments

Sean Christopherson Feb. 23, 2024, 1:35 a.m. UTC | #1
On Wed, 10 Jan 2024 18:00:40 -0800, Sean Christopherson wrote:
> This series is the result of digging into why deleting a memslot, which on
> x86 forces all vCPUs to reload a new MMU root, causes noticeably more jitter
> in vCPUs and other tasks when running with the TDP MMU than the Shadow MMU
> (with TDP enabled).
> 
> Patch 1 addresses the most obvious issue by simply zapping at a finer
> granularity so that if a different task, e.g. a vCPU, wants to run on the
> pCPU doing the zapping, it doesn't have to wait for KVM to zap an entire
> 1GiB region, which can take a hundreds of microseconds (or more).  The
> shadow MMU checks for need_resched() (and mmu_lock contention, see below)
> every 10 zaps, which is why the shadow MMU doesn't induce the same level
> of jitter.
> 
> [...]

Applied to kvm-x86 mmu, thanks!

[1/8] KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
      https://github.com/kvm-x86/linux/commit/8ca983631f3c
[2/8] KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots
      https://github.com/kvm-x86/linux/commit/fcdffe97f80e
[3/8] KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators
      https://github.com/kvm-x86/linux/commit/6577f1efdff4
[4/8] KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range
      https://github.com/kvm-x86/linux/commit/99b85fda91b1
[5/8] KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs
      https://github.com/kvm-x86/linux/commit/d746182337c2
[6/8] KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read
      https://github.com/kvm-x86/linux/commit/f5238c2a60f1
[7/8] KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read
      https://github.com/kvm-x86/linux/commit/dab285e4ec73
[8/8] KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read
      https://github.com/kvm-x86/linux/commit/576a15de8d29

--
https://github.com/kvm-x86/linux/tree/next