mbox series

[0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT

Message ID 20240309010929.1403984-1-seanjc@google.com (mailing list archive)
Headers show
Series KVM: VMX: Drop MTRR virtualization, honor guest PAT | expand

Message

Sean Christopherson March 9, 2024, 1:09 a.m. UTC
First, rip out KVM's support for virtualizing guest MTRRs on VMX.  The
code is costly to main, a drag on guest boot performance, imperfect, and
not required for functional correctness with modern guest kernels.  Many
details in patch 1's changelog.

With MTRR virtualization gone, always honor guest PAT on Intel CPUs that
support self-snoop, as such CPUs are guaranteed to maintain coherency
even if the guest is aliasing memtypes, e.g. if the host is using WB but
the guest is using WC.  Honoring guest PAT is desirable for use cases
where the guest must use WC when accessing memory that is DMA'd from a
non-coherent device that does NOT bounce through VFIO, e.g. for mediated
virtual GPUs.

The SRCU patch adds an API that is effectively documentation for the
memory barrier in srcu_read_lock().  Intel CPUs with self-snoop require
a memory barrier after VM-Exit to ensure coherency, and KVM always does
a srcu_read_lock() before reading guest memory after VM-Exit.  Relying
on SRCU to provide the barrier allows KVM to avoid emitting a redundant
barrier of its own.

This series needs a _lot_ more testing; I arguably should have tagged it
RFC, but I'm feeling lucky.

Sean Christopherson (3):
  KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes
  KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
  KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

Yan Zhao (2):
  srcu: Add an API for a memory barrier after SRCU read lock
  KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path

 Documentation/virt/kvm/api.rst        |   6 +-
 Documentation/virt/kvm/x86/errata.rst |  18 +
 arch/x86/include/asm/kvm_host.h       |  15 +-
 arch/x86/kvm/mmu.h                    |   7 +-
 arch/x86/kvm/mmu/mmu.c                |  35 +-
 arch/x86/kvm/mtrr.c                   | 644 ++------------------------
 arch/x86/kvm/vmx/vmx.c                |  40 +-
 arch/x86/kvm/x86.c                    |  24 +-
 arch/x86/kvm/x86.h                    |   4 -
 include/linux/srcu.h                  |  14 +
 10 files changed, 105 insertions(+), 702 deletions(-)


base-commit: 964d0c614c7f71917305a5afdca9178fe8231434

Comments

Ma, Yongwei March 22, 2024, 9:29 a.m. UTC | #1
> First, rip out KVM's support for virtualizing guest MTRRs on VMX.  The code is
> costly to main, a drag on guest boot performance, imperfect, and not
> required for functional correctness with modern guest kernels.  Many details
> in patch 1's changelog.
> 
> With MTRR virtualization gone, always honor guest PAT on Intel CPUs that
> support self-snoop, as such CPUs are guaranteed to maintain coherency
> even if the guest is aliasing memtypes, e.g. if the host is using WB but the
> guest is using WC.  Honoring guest PAT is desirable for use cases where the
> guest must use WC when accessing memory that is DMA'd from a non-
> coherent device that does NOT bounce through VFIO, e.g. for mediated
> virtual GPUs.
> 
> The SRCU patch adds an API that is effectively documentation for the
> memory barrier in srcu_read_lock().  Intel CPUs with self-snoop require a
> memory barrier after VM-Exit to ensure coherency, and KVM always does a
> srcu_read_lock() before reading guest memory after VM-Exit.  Relying on
> SRCU to provide the barrier allows KVM to avoid emitting a redundant barrier
> of its own.
> 
> This series needs a _lot_ more testing; I arguably should have tagged it RFC,
> but I'm feeling lucky.
> 
> Sean Christopherson (3):
>   KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes
>   KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
>   KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
> 
> Yan Zhao (2):
>   srcu: Add an API for a memory barrier after SRCU read lock
>   KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path
> 
>  Documentation/virt/kvm/api.rst        |   6 +-
>  Documentation/virt/kvm/x86/errata.rst |  18 +
>  arch/x86/include/asm/kvm_host.h       |  15 +-
>  arch/x86/kvm/mmu.h                    |   7 +-
>  arch/x86/kvm/mmu/mmu.c                |  35 +-
>  arch/x86/kvm/mtrr.c                   | 644 ++------------------------
>  arch/x86/kvm/vmx/vmx.c                |  40 +-
>  arch/x86/kvm/x86.c                    |  24 +-
>  arch/x86/kvm/x86.h                    |   4 -
>  include/linux/srcu.h                  |  14 +
>  10 files changed, 105 insertions(+), 702 deletions(-)
> 
> 
> base-commit: 964d0c614c7f71917305a5afdca9178fe8231434
> --
> 2.44.0.278.ge034bb2e1d-goog
> 
Verified iGPU passthrough(GVT-d) on Intel platforms, TGL Core(TM) i5-1135G7/ADL Core(TM) i7-12700/RPL/MTL Ultra 7 + Ubuntu22.04 LTS.
Both Linux Ubuntu 22.04 VM and Windows10 VM could boot up successfully. 
3D benchmark GLmark2 can run as expected in the guest VM.

Tested-by: Yongwei Ma <yongwei.ma@intel.com>

Best Regards,
Yongwei Ma
Yan Zhao March 22, 2024, 1:08 p.m. UTC | #2
Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test
with below error log:
"FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902
FAIL: Guest didn't run to completion."

Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before
enter guest in my side.
vmcs_write(HOST_PAT, 0x6);
vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT);
vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
Ma, XiangfeiX March 25, 2024, 6:56 a.m. UTC | #3
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>

I have verified this method which can solve the issue.

-----Original Message-----
From: Zhao, Yan Y <yan.y.zhao@intel.com> 
Sent: Friday, March 22, 2024 9:08 PM
To: Sean Christopherson <seanjc@google.com>; Ma, XiangfeiX <xiangfeix.ma@intel.com>; Hao, Xudong <xudong.hao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com>
Subject: Re: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT

Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test with below error log:
"FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902
FAIL: Guest didn't run to completion."

Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before enter guest in my side.
vmcs_write(HOST_PAT, 0x6);
vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT); vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
Ma, XiangfeiX March 25, 2024, 8:02 a.m. UTC | #4
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>

Testing environment is based on the EMR-2S3 platform + CentOS 9(kernel version: 6.8.0-rc4).
Test cases include cpu, amx, umip, ptvmx, IPIv, vtd, PMU, SGX, kmv-unit-tests, kvm selftests, etc. And workload test on the guest using Netperf(bridge) and SPECJBB(passthrough NIC).
Except for the known issue and the previously mentioned "rdtsc_vmexit_diff_test", no other issue found.

-----Original Message-----
From: Ma, XiangfeiX 
Sent: Monday, March 25, 2024 2:56 PM
To: Zhao, Yan Y <yan.y.zhao@intel.com>; Sean Christopherson <seanjc@google.com>; Hao, Xudong <xudong.hao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com>
Subject: RE: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT

Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com>

I have verified this method which can solve the issue.

-----Original Message-----
From: Zhao, Yan Y <yan.y.zhao@intel.com> 
Sent: Friday, March 22, 2024 9:08 PM
To: Sean Christopherson <seanjc@google.com>; Ma, XiangfeiX <xiangfeix.ma@intel.com>; Hao, Xudong <xudong.hao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com>
Subject: Re: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT

Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test with below error log:
"FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902
FAIL: Guest didn't run to completion."

Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before enter guest in my side.
vmcs_write(HOST_PAT, 0x6);
vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT); vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
Sean Christopherson June 5, 2024, 11:20 p.m. UTC | #5
On Fri, 08 Mar 2024 17:09:24 -0800, Sean Christopherson wrote:
> First, rip out KVM's support for virtualizing guest MTRRs on VMX.  The
> code is costly to main, a drag on guest boot performance, imperfect, and
> not required for functional correctness with modern guest kernels.  Many
> details in patch 1's changelog.
> 
> With MTRR virtualization gone, always honor guest PAT on Intel CPUs that
> support self-snoop, as such CPUs are guaranteed to maintain coherency
> even if the guest is aliasing memtypes, e.g. if the host is using WB but
> the guest is using WC.  Honoring guest PAT is desirable for use cases
> where the guest must use WC when accessing memory that is DMA'd from a
> non-coherent device that does NOT bounce through VFIO, e.g. for mediated
> virtual GPUs.
> 
> [...]

Applied to kvm-x86 mtrrs, to get as much testing as possible before a potential
merge in 6.11.

Paul, if you can take a gander at patch 3, it would be much appreciated.

Thanks!

[1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes
      https://github.com/kvm-x86/linux/commit/0a7b73559b39
[2/5] KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
      https://github.com/kvm-x86/linux/commit/e1548088ff54
[3/5] srcu: Add an API for a memory barrier after SRCU read lock
      https://github.com/kvm-x86/linux/commit/fcfe671e0879
[4/5] KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path
      https://github.com/kvm-x86/linux/commit/eb8d8fc29286
[5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
      https://github.com/kvm-x86/linux/commit/95200f24b862

--
https://github.com/kvm-x86/linux/tree/next
Paul E. McKenney June 6, 2024, 12:03 a.m. UTC | #6
On Wed, Jun 05, 2024 at 04:20:34PM -0700, Sean Christopherson wrote:
> On Fri, 08 Mar 2024 17:09:24 -0800, Sean Christopherson wrote:
> > First, rip out KVM's support for virtualizing guest MTRRs on VMX.  The
> > code is costly to main, a drag on guest boot performance, imperfect, and
> > not required for functional correctness with modern guest kernels.  Many
> > details in patch 1's changelog.
> > 
> > With MTRR virtualization gone, always honor guest PAT on Intel CPUs that
> > support self-snoop, as such CPUs are guaranteed to maintain coherency
> > even if the guest is aliasing memtypes, e.g. if the host is using WB but
> > the guest is using WC.  Honoring guest PAT is desirable for use cases
> > where the guest must use WC when accessing memory that is DMA'd from a
> > non-coherent device that does NOT bounce through VFIO, e.g. for mediated
> > virtual GPUs.
> > 
> > [...]
> 
> Applied to kvm-x86 mtrrs, to get as much testing as possible before a potential
> merge in 6.11.
> 
> Paul, if you can take a gander at patch 3, it would be much appreciated.
> 
> Thanks!
> 
> [1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes
>       https://github.com/kvm-x86/linux/commit/0a7b73559b39
> [2/5] KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
>       https://github.com/kvm-x86/linux/commit/e1548088ff54
> [3/5] srcu: Add an API for a memory barrier after SRCU read lock
>       https://github.com/kvm-x86/linux/commit/fcfe671e0879

Looks straightforward enough.  We could combine this with the existing
smp_mb__after_srcu_read_unlock(), but if we did that, someone would no
doubt come up with some clever optimization that provided a full barrier
in srcu_read_lock() but not in srcu_read_unlock() or vice versa.  So:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

> [4/5] KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path
>       https://github.com/kvm-x86/linux/commit/eb8d8fc29286
> [5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
>       https://github.com/kvm-x86/linux/commit/95200f24b862
> 
> --
> https://github.com/kvm-x86/linux/tree/next