[15/19] kvm: x86: Save and restore guest XFD_ERR properly

KVM needs to save the guest XFD_ERR value before this register
might be accessed by the host and restore it before entering the
guest.

This implementation saves guest XFD_ERR in two transition points:

  - When the vCPU thread exits to the userspace VMM;
  - When the vCPU thread is preempted;

XFD_ERR is cleared to ZERO right after saving the previous guest
value. Otherwise a stale guest value may confuse the host #NM
handler to misinterpret a non-XFD-related #NM as XFD related.

There is no need to save the host XFD_ERR value because the only
place where XFD_ERR is consumed outside of KVM is in #NM handler
(which can not be preempted by a vCPU thread). XFD_ERR should
always be observed as ZER0 outside of #NM hanlder, thus clearing
XFD_ERR meets the host expectation here.

The saved guest value is restored to XFD_ERR right before entering
the guest (with preemption disabled).

Current implementation still has two opens which we would like
to hear suggestions:

  1) Will #NM be triggered in host kernel?

  Now the code is written assuming above is true, and it's the only
  reason for saving guest XFD_ERR at preemption time. Otherwise the
  save is only required when the CPU enters ring-3 (either from the
  vCPU itself or other threads), by leveraging the "user-return
  notifier" machinery as suggested by Paolo.

  2) When to enable XFD_ERR save/restore?

  There are four options on the table:

    a) As long as guest cpuid has xfd enabled

       XFD_ERR save/restore is enabled in every VM-exit (if preemption
       or ret-to-userspace happens)

    b) When the guest sets IA32_XFD to 1 for the first time

       Indicate that guest OS supports XFD features. Because guest OS
       usually initializes IA32_XFD at boot time, XFD_ERR save/restore
       is enabled for almost every VM-exit (if preemption or ret-to-
       userspace happens).

       No save/restore for legacy guest OS which doesn't support XFD
       features at all (thus won't touch IA32_XFD).

    c) When the guest sets IA32_XFD to 0 for the first time

       Lazily enabling XFD_ERR save/restore until XFD features are
       used inside guest. However, this option doesn't work because
       XFD_ERR is set when #NM is raised. An VM-exit could happen
       between CPU raising #NM and guest #NM handler reading XFD_ERR
       (before setting XFD to 0). The very first XFD_ERR might be
       already clobbered by the host due to no save/restore in that
       small window.

    d) When the 1st guest #NM with non-zero XFD_ERR occurs

       Lazily enabling XFD_ERR save/restore until XFD features are
       used inside guest. This requires intercepting guest #NM until
       non-zero XFD_ERR occurs. If a guest with XFD in cpuid never
       launches an AMX application, it implies that #NM is always
       trapped thus adding a constant overhead which may be even
       higher than doing RDMSR in preemption path in a) and b):

         #preempts < #VMEXITS (no #NM trap) < #VMEXITS (#NM trap)

       The number of preemptions and ret-to-userspaces should be a
       small portion of total #VMEXITs in a healthy virtualization
       environment. Our gut-feeling is that adding at most one MSR
       read and one MSR write to the preempt/user-ret paths is possibly
       more efficient than increasing #VMEXITs due to trapping #NM.

For above analysis we plan to go option b), although this version
currently implements a). But we would like to hear other suggestions
before making this change.

Signed-off-by: Jing Liu <jing2.liu@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 arch/x86/kernel/fpu/core.c | 2 ++
 arch/x86/kvm/cpuid.c       | 5 +++++
 arch/x86/kvm/vmx/vmx.c     | 2 ++
 arch/x86/kvm/vmx/vmx.h     | 2 +-
 arch/x86/kvm/x86.c         | 5 +++++
 5 files changed, 15 insertions(+), 1 deletion(-)

Message ID	20211208000359.2853257-16-yang.zhong@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 200B9C433FE for <kvm@archiver.kernel.org>; Tue, 7 Dec 2021 15:10:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238791AbhLGPNz (ORCPT <rfc822;kvm@archiver.kernel.org>); Tue, 7 Dec 2021 10:13:55 -0500 Received: from mga14.intel.com ([192.55.52.115]:5619 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238765AbhLGPNh (ORCPT <rfc822;kvm@vger.kernel.org>); Tue, 7 Dec 2021 10:13:37 -0500 X-IronPort-AV: E=McAfee;i="6200,9189,10190"; a="237821291" X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="237821291" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Dec 2021 07:10:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,293,1631602800"; d="scan'208";a="461290108" Received: from icx.bj.intel.com ([10.240.192.117]) by orsmga003.jf.intel.com with ESMTP; 07 Dec 2021 07:10:02 -0800 From: Yang Zhong <yang.zhong@intel.com> To: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, pbonzini@redhat.com Cc: seanjc@google.com, jun.nakajima@intel.com, kevin.tian@intel.com, jing2.liu@linux.intel.com, jing2.liu@intel.com, yang.zhong@intel.com Subject: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly Date: Tue, 7 Dec 2021 19:03:55 -0500 Message-Id: <20211208000359.2853257-16-yang.zhong@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211208000359.2853257-1-yang.zhong@intel.com> References: <20211208000359.2853257-1-yang.zhong@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <kvm.vger.kernel.org> X-Mailing-List: kvm@vger.kernel.org
Series	AMX Support in KVM \| expand [00/19] AMX Support in KVM [01/19] x86/fpu: Extend prctl() with guest permissions [02/19] x86/fpu: Prepare KVM for dynamically enabled states [03/19] kvm: x86: Fix xstate_required_size() to follow XSTATE alignment rule [04/19] kvm: x86: Check guest xstate permissions when KVM_SET_CPUID2 [05/19] x86/fpu: Move xfd initialization out of __fpstate_reset() to the callers [06/19] x86/fpu: Add reallocation mechanims for KVM [07/19] kvm: x86: Propagate fpstate reallocation error to userspace [08/19] x86/fpu: Move xfd_update_state() to xstate.c and export symbol [09/19] kvm: x86: Prepare reallocation check [10/19] kvm: x86: Emulate WRMSR of guest IA32_XFD [11/19] kvm: x86: Check fpstate reallocation in XSETBV emulation [12/19] x86/fpu: Prepare KVM for bringing XFD state back in-sync [13/19] kvm: x86: Disable WRMSR interception for IA32_XFD on demand [14/19] x86/fpu: Prepare for KVM XFD_ERR handling [15/19] kvm: x86: Save and restore guest XFD_ERR properly [16/19] kvm: x86: Introduce KVM_{G\|S}ET_XSAVE2 ioctl [17/19] docs: virt: api.rst: Document the new KVM_{G, S}ET_XSAVE2 ioctls [18/19] kvm: x86: AMX XCR0 support for guest [19/19] kvm: x86: Add AMX CPUIDs support

[15/19] kvm: x86: Save and restore guest XFD_ERR properly

Commit Message

Comments

Patch