kvm/x86: skip async_pf when in guest mode

Message ID	20161124163039.6847-1-rkagan@virtuozzo.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Roman Kagan <rkagan@virtuozzo.com> To: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com>, "Paolo Bonzini" <pbonzini@redhat.com>, <kvm@vger.kernel.org> CC: Denis Lunev <den@virtuozzo.com>, Roman Kagan <rkagan@virtuozzo.com> Subject: [PATCH] kvm/x86: skip async_pf when in guest mode Date: Thu, 24 Nov 2016 19:30:39 +0300 Message-ID: <20161124163039.6847-1-rkagan@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (protection.outlook.com: virtuozzo.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; VI1PR0802MB2478; 23:WBG09buWNMhA1QQx49S/ySx2EwOxNGYXffVIBZb?= =?us-ascii?Q?5s/LIQ60q+AT4G1TIlAJTHxvnbSrKXokF6XOgadSWyFGpvAXSdxoFbeOI3F3?= =?us-ascii?Q?moAs6FEC29TP3pZl1t9g/cD1+wxYZ1o4+eqqM0vWII/nFRoU8QCFLfnYV0Ch?= =?us-ascii?Q?KOGTcJMdy5fo9nZfsTxb8wG8oYOd418vWpJyphOPSpKmsQSLzqMPo+Y2l/V4?= =?us-ascii?Q?zU93djBoOAuoqKsIFsES90782KlIcHPTM/92Fv+tYwArFwHlIjcI4UGYE/zp?= =?us-ascii?Q?jgvxy4d/Cmkwbl7rglCystLAG1ce0ftmPW8TavpU0YFzVGWwBlGQ1hKbVGJp?= =?us-ascii?Q?YfFPUfqo3gYhZ+XgtX4DV6UFwZV48aOkpWUO+35wLC23p2k9kZLGb/TUNvKN?= =?us-ascii?Q?SRIkj/c1QwCLdMqJcpRyyGMj+sikblEj6LhbFUtM6Ib1gm4l5NFMXBviESC0?= =?us-ascii?Q?2qYcl7wNiNLhz3TKDWfqPAJcLdv1k6/B3tkaJhSYA21W5PhUzQYU6JdUIld2?= =?us-ascii?Q?J97DRiRrQoqOdY46JAHJVUFOq84c2UPvQh7+9bTv4SzqA7C9DTHCXTCv8fFf?= =?us-ascii?Q?wbqPhA214WCLpWuOQDr50sK0LxqHsjXWdR8fiRmG8C/O4Wgh7h5jeBy5BYVs?= =?us-ascii?Q?VgQvckKIsENAgzL4bIq0yaAmFhsca8DF0LaYcPzIR2NiCsMWl4ngkO8j/4ld?= =?us-ascii?Q?mFI8hcQ52xgwGAeL8h0YtIafxPSWzoJ4A1b0tq/rrcaLoH/gcJJwsUGno/+4?= =?us-ascii?Q?KACFVQV+LVs4TaLrAEdMnQt/4LviMEK1Yfgtf1Tu/7MT7VAV8Puhl6CoSw30?= =?us-ascii?Q?kn1VbFnQa0NhdZp/NOx125MMYtb7aig1+YX1q5rh9iCzlB81QK4gmxeL3uQi?= =?us-ascii?Q?KYseqw66jMz0zSROvS7yddwKqM8IrSNVzS7OKCoqOidABt4tJ7GRItAeOR/W?= =?us-ascii?Q?Wm9iO1Wkmb/+hNxUwcqpEB72djCVUkbaXBLT6lRgUEXwiH9p6yjmCzEhB/aC?= =?us-ascii?Q?nfJ+t/FSew1oaGfP1OUQc3AnXosd9huLSndFfpadhvVO8KJry+Jhwi9bc5sv?= =?us-ascii?Q?Ps9mx8j0ig9Z0vsoGxulAP6dcu7wimY8ZpuWs7snMFzc3KznrGSKavFvDJj4?= =?us-ascii?Q?qZ73I/buLXyN6U260jE8pH85VMoenSQpcRkJW9vwnw4sBKIHmmTcbENFvyvE?= =?us-ascii?Q?GJPjU9Cg29tWNuG7j4OcHQDmvar21uCBaFD2NtX/y/0kRWct7aPT39fOXNQ?= =?us-ascii?Q?=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; VI1PR0802MB2478; 6:mIp6fj8j/gQmieX+TheLL+io6HP174IFvoPqj05xKzYhyW3IPTo8zHvCiY81TodIBHylnjtCGXHm+QZy8h/UAzHauKv4ivbSlPudkDY3nxpwwNqMvzingMp4uSCyEUQZqiIlwrbrv5uLaX3528m2sRzC0sYYQgpOSbFxu5hK2msBg2Dym7vcYmuCfGsK0kGRgqegXEToM2ha3podlPmdtLzKtyvFjW4iozTI1sI4OcAYec0eeJc96f0QYmeZFmMFNSbKsNew2EbuwM4GVGz2n4Wo+26XtNo4u3uz9o87nL44YqNagt3D76D+pKyxywsYPudQ3LYv+c9eO/k07Aw/U9VaDl2WvBP66wvmmGBGRnk=; 5:VnC4EBsTN1bjuX74gsFGGbOdP756BMdrsFmJp0UZVMyoTKgEr7CGYVYf989n9tqr6ceJoK5k3vyfk7TdQywdhbiLBzs2rhlYICE37zUtyH/1WuEEh66UPRiOrrKO6mPGuSSxU3KZx00DCmfGen6DcsXJKiPyj5xnUdnBRuycWVg=; 24:RbmJ40FUgeBCIvyolM8uCdDq5nlvf+5Pm0nweJDshwxvz4I7AtlJ/0EPkayALP5pO/o6KNyTScI8AwENsbfgsFP+XdoEuuK9AmS7lld6RjA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; VI1PR0802MB2478; 7:gLpURv61CU6ry9QL/7WkQp7/eHAWfiP5VqifLsGHcaDgBDqGgjQFfSZTi7y9a+Vtggw/YNA/+RLwNdR/HNNypPUQJtil8FtxUDdHqetvOehooX9rpd+hS0mIrCM4szKhmdv4kW0IYQ3SkHo9z2PewtcTn4DwGu9C2dHEX3jFEXiO4NNQROslzO49s+9je22BJa4Vpe0kLRZPkiuV7i2Zb96aEWNPS2ZXXJSp93AeP72Y6hfE5+BQcF8+feh9DTQPPEKG07TCKeWq8fMY4wmMoehqYVrZF1rSdIq6neM6k/aPXeDbCLRWtOFVPzVeI7G0Sc79iR2Zit46IMFRmZzvj09I0QEBtb2lV+p8hRkFgYI=; 20:YsaT4MLorttXjOP0KoMAGZH+RyzQ3zwSKhhB/WuNRvtBsHKFuQ8engXXgr9PHPX2VLSMpvvmxoubzuhUaCpbZzdtx1v5ERElp5EVLLGM4tf3NFDji3yI60XYF/Wj+bfmKLp6bqRDPFCJnpKMH8l6yf/fiEfF1zpSxg53OVGDIQY= Sender: kvm-owner@vger.kernel.org Precedence: bulk

Message ID

20161124163039.6847-1-rkagan@virtuozzo.com (mailing list archive)

State

New, archived

Headers

From: Roman Kagan <rkagan@virtuozzo.com>
To: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= <rkrcmar@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>, <kvm@vger.kernel.org>
CC: Denis Lunev <den@virtuozzo.com>, Roman Kagan <rkagan@virtuozzo.com>
Subject: [PATCH] kvm/x86: skip async_pf when in guest mode
Date: Thu, 24 Nov 2016 19:30:39 +0300
Message-ID: <20161124163039.6847-1-rkagan@virtuozzo.com>
MIME-Version: 1.0
Content-Type: text/plain
Received-SPF: None (protection.outlook.com: virtuozzo.com does not designate
	permitted sender hosts)
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Nov 2016 16:31:11.7223
	(UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2478
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Commit Message

Roman Kagan Nov. 24, 2016, 4:30 p.m. UTC

Async pagefault machinery assumes communication with L1 guests only: all
the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
currently doesn't check if the vCPU is running L1 or L2, and may inject

To reproduce the problem, use a host with swap enabled, run a VM on it,
run a nested VM on top, and set RSS limit for L1 on the host via
/sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
to swap it out (you may need to tighten and release it once or twice, or
create some memory load inside L1).  Very quickly L2 guest starts
receiving pagefaults with bogus %cr2 (apf tokens from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait.

To avoid that, only do async_pf stuff when executing L1 guest.

Note: this patch only fixes x86; other async_pf-capable arches may also
need something similar.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
---
 arch/x86/kvm/mmu.c | 2 +-
 arch/x86/kvm/x86.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Roman Kagan Nov. 25, 2016, 7:15 a.m. UTC | #1

On Thu, Nov 24, 2016 at 09:49:59PM +0100, Radim Krčmář wrote:
> 2016-11-24 19:30+0300, Roman Kagan:
> > Async pagefault machinery assumes communication with L1 guests only: all
> > the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
> > currently doesn't check if the vCPU is running L1 or L2, and may inject
> > 
> > To reproduce the problem, use a host with swap enabled, run a VM on it,
> > run a nested VM on top, and set RSS limit for L1 on the host via
> > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
> > to swap it out (you may need to tighten and release it once or twice, or
> > create some memory load inside L1).  Very quickly L2 guest starts
> > receiving pagefaults with bogus %cr2 (apf tokens from the host
> > actually), and L1 guest starts accumulating tasks stuck in D state in
> > kvm_async_pf_task_wait.
> > 
> > To avoid that, only do async_pf stuff when executing L1 guest.
> > 
> > Note: this patch only fixes x86; other async_pf-capable arches may also
> > need something similar.
> > 
> > Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> > ---
> 
> Applied to kvm/queue, thanks.
> 
> The VM task in L1 could be scheduled out instead of hogging the VCPU for
> a long time, so L1 might want to handle async_pf, especially if L1 set
> KVM_ASYNC_PF_SEND_ALWAYS.  Another case happens if L1 scheduled out a
> high-priority task on async_pf and executed the low-priority VM task in
> spare time, expecting another #PF when the page is ready, which might be
> long before the next nested VM exit.
> 
> Have you considered doing a nested VM exit and delivering the async_pf
> to L1 immediately?

I haven't, but it seems to make sense indeed for "page ready" async_pfs.  

I'll have a look into it.

Thanks,
Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Kagan Nov. 25, 2016, 8:42 a.m. UTC | #2

On Fri, Nov 25, 2016 at 10:15:21AM +0300, Roman Kagan wrote:
> On Thu, Nov 24, 2016 at 09:49:59PM +0100, Radim Krčmář wrote:
> > 2016-11-24 19:30+0300, Roman Kagan:
> > > Async pagefault machinery assumes communication with L1 guests only: all
> > > the state -- MSRs, apf area addresses, etc, -- are for L1.  However, it
> > > currently doesn't check if the vCPU is running L1 or L2, and may inject
> > > 
> > > To reproduce the problem, use a host with swap enabled, run a VM on it,
> > > run a nested VM on top, and set RSS limit for L1 on the host via
> > > /sys/fs/cgroup/memory/machine.slice/machine-*.scope/memory.limit_in_bytes
> > > to swap it out (you may need to tighten and release it once or twice, or
> > > create some memory load inside L1).  Very quickly L2 guest starts
> > > receiving pagefaults with bogus %cr2 (apf tokens from the host
> > > actually), and L1 guest starts accumulating tasks stuck in D state in
> > > kvm_async_pf_task_wait.
> > > 
> > > To avoid that, only do async_pf stuff when executing L1 guest.
> > > 
> > > Note: this patch only fixes x86; other async_pf-capable arches may also
> > > need something similar.
> > > 
> > > Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> > > ---
> > 
> > Applied to kvm/queue, thanks.
> > 
> > The VM task in L1 could be scheduled out instead of hogging the VCPU for
> > a long time, so L1 might want to handle async_pf, especially if L1 set
> > KVM_ASYNC_PF_SEND_ALWAYS.  Another case happens if L1 scheduled out a
> > high-priority task on async_pf and executed the low-priority VM task in
> > spare time, expecting another #PF when the page is ready, which might be
> > long before the next nested VM exit.
> > 
> > Have you considered doing a nested VM exit and delivering the async_pf
> > to L1 immediately?
> 
> I haven't, but it seems to make sense indeed for "page ready" async_pfs.  
> 
> I'll have a look into it.

What's the correct way to kick L2 to L1 from the host?  I failed to find
one from a brief skimming through the code.  We need a sensible exit
reason delivered to L1 (probably "external interrupt" will do) but I
don't see a method to do so without actually injecting an interrupt into
L1 which is not unlikely to confuse it.  Any suggestion?

Thanks,
Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Paolo Bonzini Nov. 25, 2016, 8:51 a.m. UTC | #3

> What's the correct way to kick L2 to L1 from the host?  I failed to find
> one from a brief skimming through the code.  We need a sensible exit
> reason delivered to L1 (probably "external interrupt" will do) but I
> don't see a method to do so without actually injecting an interrupt into
> L1 which is not unlikely to confuse it.  Any suggestion?

Perhaps check for async page faults in nested_vmx_check_exception and
nested_svm_intercept, before testing the exception bitmap?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Roman Kagan Nov. 25, 2016, 11:17 a.m. UTC | #4

On Fri, Nov 25, 2016 at 03:51:24AM -0500, Paolo Bonzini wrote:
> 
> > What's the correct way to kick L2 to L1 from the host?  I failed to find
> > one from a brief skimming through the code.  We need a sensible exit
> > reason delivered to L1 (probably "external interrupt" will do) but I
> > don't see a method to do so without actually injecting an interrupt into
> > L1 which is not unlikely to confuse it.  Any suggestion?
> 
> Perhaps check for async page faults in nested_vmx_check_exception and
> nested_svm_intercept, before testing the exception bitmap?

Yeah I also thought about this, but hoped there's an arch-agnostic
way...  Will try this approach, then.

Thanks,
Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9c7e98..cdafc61 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3510,7 +3510,7 @@  static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	if (!async)
 		return false; /* *pfn has correct page already */
 
-	if (!prefault && can_do_async_pf(vcpu)) {
+	if (!prefault && !is_guest_mode(vcpu) && can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(gva, gfn);
 		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
 			trace_kvm_async_pf_doublefault(gva, gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 04c5d96..bf11fe4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6864,7 +6864,8 @@  static int vcpu_run(struct kvm_vcpu *vcpu)
 			break;
 		}
 
-		kvm_check_async_pf_completion(vcpu);
+		if (!is_guest_mode(vcpu))
+			kvm_check_async_pf_completion(vcpu);
 
 		if (signal_pending(current)) {
 			r = -EINTR;

kvm/x86: skip async_pf when in guest mode

Commit Message

Comments

Patch