diff mbox

[RFC,1/2] KVM/nVMX: Cleanly exit from L2 to L1 on user-space exit

Message ID 1518791030-31765-2-git-send-email-karahmed@amazon.de (mailing list archive)
State New, archived
Headers show

Commit Message

KarimAllah Ahmed Feb. 16, 2018, 2:23 p.m. UTC
On exit to L0 user-space, always exit from L2 to L1 and synchronize the
state properly for L1. This ensures that user-space only ever sees L1
state. It also allows L1 to be saved and resumed properly. Obviously
horrible things will still happen to the L2 guest. This will be handled in
a seperate patch.

There is only a single case which requires a bit of extra care. When the
decision to switch to user space happens while handling an L1
VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle this
as cleanly as possible without major restructuring, we simply do not exit
to user-space in this case and give L2 another chance to actually run. We
also request an immediate exit to ensure that an exit to user space will
still happen for the L2.

The only reason I can see where an exit to user space will occur while L2
is running is because of a pending signal. The is how user space preempts
the KVM_RUN in order to save the state. L2 exits are either handled in L0
kernel or reflected to L1 and not handled in L0 user-space.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx.c              | 39 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c              | 33 ++++++++++++++++++++++++++++-----
 3 files changed, 69 insertions(+), 5 deletions(-)

Comments

Paolo Bonzini Feb. 16, 2018, 2:52 p.m. UTC | #1
On 16/02/2018 15:23, KarimAllah Ahmed wrote:
> On exit to L0 user-space, always exit from L2 to L1 and synchronize the
> state properly for L1. This ensures that user-space only ever sees L1
> state. It also allows L1 to be saved and resumed properly. Obviously
> horrible things will still happen to the L2 guest. This will be handled in
> a seperate patch.
> 
> There is only a single case which requires a bit of extra care. When the
> decision to switch to user space happens while handling an L1
> VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle this
> as cleanly as possible without major restructuring, we simply do not exit
> to user-space in this case and give L2 another chance to actually run. We
> also request an immediate exit to ensure that an exit to user space will
> still happen for the L2.
> 
> The only reason I can see where an exit to user space will occur while L2
> is running is because of a pending signal. The is how user space preempts
> the KVM_RUN in order to save the state. L2 exits are either handled in L0
> kernel or reflected to L1 and not handled in L0 user-space.
> 
> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>

We discussed this with Jim about one year ago and then again last
January.  While I (in 2017) and David H. (in 2018) also thought about
doing an L2->L1 exit like this, Jim quickly got me to change my
mind---it doesn't really seem like a good idea compared to doing full
checkpointing of VMX state.  You can find the discussion at
https://patchwork.kernel.org/patch/9454799/.

Of course, Jim's series (first posted Nov 2016) is way more complex than
yours, but the good news is that most of his changes have already been
merged; the only ones missing are:

https://patchwork.kernel.org/patch/9454799/
 [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

https://patchwork.kernel.org/patch/9454797/
 [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state

The main request was to make [7/8] a bit more generic so that it can be
applied to SVM as well.  That's pretty simple though.

Thanks,

Paolo
KarimAllah Ahmed Feb. 16, 2018, 3:23 p.m. UTC | #2
On 02/16/2018 03:52 PM, Paolo Bonzini wrote:
> On 16/02/2018 15:23, KarimAllah Ahmed wrote:
>> On exit to L0 user-space, always exit from L2 to L1 and synchronize the
>> state properly for L1. This ensures that user-space only ever sees L1
>> state. It also allows L1 to be saved and resumed properly. Obviously
>> horrible things will still happen to the L2 guest. This will be handled in
>> a seperate patch.
>>
>> There is only a single case which requires a bit of extra care. When the
>> decision to switch to user space happens while handling an L1
>> VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle this
>> as cleanly as possible without major restructuring, we simply do not exit
>> to user-space in this case and give L2 another chance to actually run. We
>> also request an immediate exit to ensure that an exit to user space will
>> still happen for the L2.
>>
>> The only reason I can see where an exit to user space will occur while L2
>> is running is because of a pending signal. The is how user space preempts
>> the KVM_RUN in order to save the state. L2 exits are either handled in L0
>> kernel or reflected to L1 and not handled in L0 user-space.
>>
>> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> 
> We discussed this with Jim about one year ago and then again last
> January.  While I (in 2017) and David H. (in 2018) also thought about
> doing an L2->L1 exit like this, Jim quickly got me to change my
> mind---it doesn't really seem like a good idea compared to doing full
> checkpointing of VMX state.  You can find the discussion at
> https://patchwork.kernel.org/patch/9454799/.
> 
> Of course, Jim's series (first posted Nov 2016) is way more complex than
> yours, but the good news is that most of his changes have already been
> merged; the only ones missing are:
> 
> https://patchwork.kernel.org/patch/9454799/
>   [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE
> 
> https://patchwork.kernel.org/patch/9454797/
>   [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state

Oh! Thank you for pointing this out. Somehow I did not notice any of this :)

I was also thinking about doing a full save of VMX state then I decided
to do the switch instead.

In any case, Looking forward to see those bits in master.

> 
> The main request was to make [7/8] a bit more generic so that it can be
> applied to SVM as well.  That's pretty simple though.
> 
> Thanks,
> 
> Paolo
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
KarimAllah Ahmed Feb. 19, 2018, 5:44 a.m. UTC | #3
Now that I looked at the Jim's patch and also went through *most* of 
the comments, I think I realized that my approach is reasonable and in 
fact I do not see any of the downsides mentioned in the other thread 
any more. Specially the one about handling VMLAUNCH case, the code now 
handles this cleanly and using an approach that is already available
in the code base (there is precedent).

Only thing that is not handled, AFAICT, is:

> The other unfortunate thing about flushing the "current" VMCS12 state
> to guest memory on each L2->userspace transition is that much of this
> state is in the VMCS02. So,it's not just a matter of writing a
> VMCS12_SIZE blob to guest memory; first, the cached VMCS12 has to be
> updated from the VMCS02 by calling sync_vmcs12(). This will be
> particularly bad for double-nesting, where L1 kvm has to take all of
> those VMREAD VM-exits.

.. which is something I can actually fix if needed, but is there really
anyone doing this today? Do we actually need to optimize for this at
all?

Is there any thing else that I am missing?

So what are the upsides for my approach:

1- It ensures that user-space tools that does not understand nesting
   can still see the expected guest state when querying guest state or 
   even when trying to read memory, translate an address, etc.
2- It is very simple and does not require a whole lot of state in user-
   space.
3- It's even rebased on master :) (Ok, maybe this is not a technical 
   reason :D)

Thoughts?

On a side note: I have also fixed the VMEntry issue that I mentioned
in the commit message and I have done hundreds of save/resume 
successfully already.

On Fri, 2018-02-16 at 16:23 +0100, KarimAllah Ahmed wrote:
> On 02/16/2018 03:52 PM, Paolo Bonzini wrote:
> > 
> > On 16/02/2018 15:23, KarimAllah Ahmed wrote:
> > > 
> > > On exit to L0 user-space, always exit from L2 to L1 and
> > > synchronize the
> > > state properly for L1. This ensures that user-space only ever
> > > sees L1
> > > state. It also allows L1 to be saved and resumed properly.
> > > Obviously
> > > horrible things will still happen to the L2 guest. This will be
> > > handled in
> > > a seperate patch.
> > > 
> > > There is only a single case which requires a bit of extra care.
> > > When the
> > > decision to switch to user space happens while handling an L1
> > > VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle
> > > this
> > > as cleanly as possible without major restructuring, we simply do
> > > not exit
> > > to user-space in this case and give L2 another chance to actually
> > > run. We
> > > also request an immediate exit to ensure that an exit to user
> > > space will
> > > still happen for the L2.
> > > 
> > > The only reason I can see where an exit to user space will occur
> > > while L2
> > > is running is because of a pending signal. The is how user space
> > > preempts
> > > the KVM_RUN in order to save the state. L2 exits are either
> > > handled in L0
> > > kernel or reflected to L1 and not handled in L0 user-space.
> > > 
> > > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> > 
> > We discussed this with Jim about one year ago and then again last
> > January.  While I (in 2017) and David H. (in 2018) also thought
> > about
> > doing an L2->L1 exit like this, Jim quickly got me to change my
> > mind---it doesn't really seem like a good idea compared to doing
> > full
> > checkpointing of VMX state.  You can find the discussion at
> > https://patchwork.kernel.org/patch/9454799/.
> > 
> > Of course, Jim's series (first posted Nov 2016) is way more complex
> > than
> > yours, but the good news is that most of his changes have already
> > been
> > merged; the only ones missing are:
> > 
> > https://patchwork.kernel.org/patch/9454799/
> >   [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE
> > 
> > https://patchwork.kernel.org/patch/9454797/
> >   [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state
> 
> Oh! Thank you for pointing this out. Somehow I did not notice any of
> this :)
> 
> I was also thinking about doing a full save of VMX state then I
> decided
> to do the switch instead.
> 
> In any case, Looking forward to see those bits in master.
> 
> > 
> > 
> > The main request was to make [7/8] a bit more generic so that it
> > can be
> > applied to SVM as well.  That's pretty simple though.
> > 
> > Thanks,
> > 
> > Paolo
> > 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Jim Mattson Feb. 20, 2018, 5:19 p.m. UTC | #4
On Sun, Feb 18, 2018 at 9:44 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:

> Is there any thing else that I am missing?

I haven't looked at your entire patch, but it looks like you are
synthesizing a VM-exit from L2 to L1 for "external interrupt." What if
the “external-interrupt exiting” VM-execution control in vmcs12 is
clear? Then L1 will not be expecting a VM-exit for "external
interrupt."
Paolo Bonzini Feb. 21, 2018, 9:58 a.m. UTC | #5
On 19/02/2018 06:44, KarimAllah Ahmed wrote:
> 1- It ensures that user-space tools that does not understand nesting
>    can still see the expected guest state when querying guest state or 
>    even when trying to read memory, translate an address, etc.

How does it work when L1 assigns an MMIO region or PIO region for direct
L2 access, and then L2 requests MMIO from userspace?  You cannot do a
vmexit in that case.

> 2- It is very simple and does not require a whole lot of state in user-
>    space.

It's still requires _some_ state.  If you need a new API, the amount of
state that the API saves/restores is not particularly interesting.

Paolo
KarimAllah Ahmed Feb. 21, 2018, 12:59 p.m. UTC | #6
On Wed, 2018-02-21 at 10:58 +0100, Paolo Bonzini wrote:
> On 19/02/2018 06:44, KarimAllah Ahmed wrote:

> > 

> > 1- It ensures that user-space tools that does not understand nesting

> >    can still see the expected guest state when querying guest state or 

> >    even when trying to read memory, translate an address, etc.

> 

> How does it work when L1 assigns an MMIO region or PIO region for direct

> L2 access, and then L2 requests MMIO from userspace?  You cannot do a

> vmexit in that case.


Ah, right! I was testing with a real device that is passed through to 
L1! and with QEMU/KVM running as L1 hypervisor that emulates e1000 for 
L2 on L1

Technically I can solve this problem, but it would not satisfy the (1) 
stated above. So the solution would be slightly more ugly (or 
significantly more).

> 

> > 

> > 2- It is very simple and does not require a whole lot of state in user-

> >    space.

> 

> It's still requires _some_ state.  If you need a new API, the amount of

> state that the API saves/restores is not particularly interesting.


I thought at some point about creating a new MSR and add it in the 
MSRs to save list.

The issue you mentioned above pretty much killed this approach for me
:)
Ironically it still works fine in our environment but it will not 
work fine for the general use-case.

> 

> Paolo

> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Paolo Bonzini Feb. 21, 2018, 1:29 p.m. UTC | #7
On 21/02/2018 13:59, Raslan, KarimAllah wrote:
> Ironically it still works fine in our environment but it will not 
> work fine for the general use-case.

Yeah, because you have so many assigned devices and don't really have
much to do in userspace.  But it could still break with an L1 hypervisor
that is not Xen or KVM or anything like that.

Paolo
KarimAllah Ahmed Feb. 21, 2018, 4:29 p.m. UTC | #8
On Tue, 2018-02-20 at 09:19 -0800, Jim Mattson wrote:
> On Sun, Feb 18, 2018 at 9:44 PM, KarimAllah Ahmed <karahmed@amazon.de> wrote:

> 

> > 

> > Is there any thing else that I am missing?

> 

> I haven't looked at your entire patch, but it looks like you are

> synthesizing a VM-exit from L2 to L1 for "external interrupt." What if

> the “external-interrupt exiting” VM-execution control in vmcs12 is

> clear? Then L1 will not be expecting a VM-exit for "external

> interrupt."


Yup, I thought about that but then I hoped that all L1 hypervisors 
would not mind this :)

Anyway, your approach is indeed superior and seems to fix all outlined 
problems.
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Konrad Rzeszutek Wilk March 27, 2018, 7:23 p.m. UTC | #9
On Fri, Feb 16, 2018 at 04:23:33PM +0100, KarimAllah Ahmed wrote:
> On 02/16/2018 03:52 PM, Paolo Bonzini wrote:
> > On 16/02/2018 15:23, KarimAllah Ahmed wrote:
> > > On exit to L0 user-space, always exit from L2 to L1 and synchronize the
> > > state properly for L1. This ensures that user-space only ever sees L1
> > > state. It also allows L1 to be saved and resumed properly. Obviously
> > > horrible things will still happen to the L2 guest. This will be handled in
> > > a seperate patch.
> > > 
> > > There is only a single case which requires a bit of extra care. When the
> > > decision to switch to user space happens while handling an L1
> > > VMRESUME/VMLAUNCH (i.e. pending_nested_run). In order to handle this
> > > as cleanly as possible without major restructuring, we simply do not exit
> > > to user-space in this case and give L2 another chance to actually run. We
> > > also request an immediate exit to ensure that an exit to user space will
> > > still happen for the L2.
> > > 
> > > The only reason I can see where an exit to user space will occur while L2
> > > is running is because of a pending signal. The is how user space preempts
> > > the KVM_RUN in order to save the state. L2 exits are either handled in L0
> > > kernel or reflected to L1 and not handled in L0 user-space.
> > > 
> > > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
> > 
> > We discussed this with Jim about one year ago and then again last
> > January.  While I (in 2017) and David H. (in 2018) also thought about
> > doing an L2->L1 exit like this, Jim quickly got me to change my
> > mind---it doesn't really seem like a good idea compared to doing full
> > checkpointing of VMX state.  You can find the discussion at
> > https://patchwork.kernel.org/patch/9454799/.
> > 
> > Of course, Jim's series (first posted Nov 2016) is way more complex than
> > yours, but the good news is that most of his changes have already been
> > merged; the only ones missing are:
> > 
> > https://patchwork.kernel.org/patch/9454799/
> >   [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE
> > 
> > https://patchwork.kernel.org/patch/9454797/
> >   [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state
> 
> Oh! Thank you for pointing this out. Somehow I did not notice any of this :)
> 
> I was also thinking about doing a full save of VMX state then I decided
> to do the switch instead.
> 
> In any case, Looking forward to see those bits in master.

.. Is somebody working on this or would it make sense to have a couple of
folks work together on this?
Paolo Bonzini March 27, 2018, 8:16 p.m. UTC | #10
On 27/03/2018 21:23, Konrad Rzeszutek Wilk wrote:
>>> https://patchwork.kernel.org/patch/9454799/
>>>   [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE
>>>
>>> https://patchwork.kernel.org/patch/9454797/
>>>   [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state
>> Oh! Thank you for pointing this out. Somehow I did not notice any of this :)
>>
>> I was also thinking about doing a full save of VMX state then I decided
>> to do the switch instead.
>>
>> In any case, Looking forward to see those bits in master.
>
> .. Is somebody working on this or would it make sense to have a couple of
> folks work together on this?

I was hoping for Karim to beat me, but I plan to look at it in April.

Paolo
KarimAllah Ahmed March 27, 2018, 9:51 p.m. UTC | #11
On Tue, 2018-03-27 at 22:16 +0200, Paolo Bonzini wrote:
> On 27/03/2018 21:23, Konrad Rzeszutek Wilk wrote:

> > 

> > > 

> > > > 

> > > > https://patchwork.kernel.org/patch/9454799/

> > > >   [7/8] kvm: nVMX: Introduce KVM_CAP_VMX_STATE

> > > > 

> > > > https://patchwork.kernel.org/patch/9454797/

> > > >   [8/8] kvm: nVMX: Defer gpa->hpa lookups for set_vmx_state

> > > Oh! Thank you for pointing this out. Somehow I did not notice any of this :)

> > > 

> > > I was also thinking about doing a full save of VMX state then I decided

> > > to do the switch instead.

> > > 

> > > In any case, Looking forward to see those bits in master.

> > 

> > .. Is somebody working on this or would it make sense to have a couple of

> > folks work together on this?

> 

> I was hoping for Karim to beat me, but I plan to look at it in April.


I actually did not touch them because I thought that Jim would send 
another round of the patches.

Anyway, if Jim is OK with it I can pick them up and resend.

> 

> Paolo

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Paolo Bonzini March 28, 2018, 8:59 a.m. UTC | #12
On 27/03/2018 23:51, Raslan, KarimAllah wrote:
>>> .. Is somebody working on this or would it make sense to have a couple of
>>> folks work together on this?
>> I was hoping for Karim to beat me, but I plan to look at it in April.
> I actually did not touch them because I thought that Jim would send 
> another round of the patches.
> 
> Anyway, if Jim is OK with it I can pick them up and resend.

I think you can assume that, since the patches have been posted over a
year ago.  You have my blessing. :)

Paolo
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 318a414..2c8be56 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -961,6 +961,8 @@  struct kvm_x86_ops {
 			      struct msr_bitmap_range *whitelist);
 
 	void (*prepare_guest_switch)(struct kvm_vcpu *vcpu);
+	void (*prepare_exit_user)(struct kvm_vcpu *vcpu);
+	bool (*allow_exit_user)(struct kvm_vcpu *vcpu);
 	void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
 	void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 52539be..22eb0dc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2130,6 +2130,42 @@  static unsigned long segment_base(u16 selector)
 }
 #endif
 
+static bool vmx_allow_exit_user(struct kvm_vcpu *vcpu)
+{
+	return !to_vmx(vcpu)->nested.nested_run_pending;
+}
+
+static void vmx_prepare_exit_user(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	if (vmx->nested.current_vmptr == -1ull)
+                return;
+
+	/*
+	 * If L2 is running no need to update vmcs12 from shadow VMCS.
+	 * Just force an exit from L2 to L1
+	 */
+	if (is_guest_mode(vcpu)) {
+		/*
+		 * Pretend that an external interrupt occurred while L2 is
+		 * running to cleanly exit into L1.
+		 */
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
+
+		/* Switch from L2 MMU to L1 MMU */
+		kvm_mmu_reset_context(vcpu);
+	} else if (enable_shadow_vmcs) {
+		copy_shadow_to_vmcs12(vmx);
+	}
+
+	/* Flush VMCS12 to guest memory */
+	kvm_write_guest(vcpu->kvm, vmx->nested.current_vmptr,
+			get_vmcs12(vcpu), sizeof(*vmx->nested.cached_vmcs12));
+
+	return;
+}
+
 static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -12440,6 +12476,9 @@  static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
 
 	.whitelist_msrs = vmx_whitelist_msrs,
 
+	.prepare_exit_user = vmx_prepare_exit_user,
+	.allow_exit_user = vmx_allow_exit_user,
+
 	.prepare_guest_switch = vmx_save_host_state,
 	.vcpu_load = vmx_vcpu_load,
 	.vcpu_put = vmx_vcpu_put,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2cfbf39..8256a2d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -996,6 +996,12 @@  bool kvm_rdpmc(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_rdpmc);
 
+static __always_inline bool should_exit_user(struct kvm_vcpu *vcpu)
+{
+	return signal_pending(current) && (kvm_x86_ops->allow_exit_user ?
+					   kvm_x86_ops->allow_exit_user(vcpu): true);
+}
+
 /*
  * List of msr numbers which we expose to userspace through KVM_GET_MSRS
  * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST.
@@ -7187,8 +7193,7 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
 		kvm_x86_ops->sync_pir_to_irr(vcpu);
 
-	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
-	    || need_resched() || signal_pending(current)) {
+	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) || need_resched()) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
 		local_irq_enable();
@@ -7198,6 +7203,20 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		goto cancel_injection;
 	}
 
+	if (signal_pending(current)) {
+		if (kvm_x86_ops->allow_exit_user &&
+		    kvm_x86_ops->allow_exit_user(vcpu)) {
+			vcpu->mode = OUTSIDE_GUEST_MODE;
+			smp_wmb();
+			local_irq_enable();
+			preempt_enable();
+			vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			r = 1;
+			goto cancel_injection;
+		} else
+			req_immediate_exit = true;
+	}
+
 	kvm_load_guest_xcr0(vcpu);
 
 	if (req_immediate_exit) {
@@ -7364,7 +7383,7 @@  static int vcpu_run(struct kvm_vcpu *vcpu)
 
 		kvm_check_async_pf_completion(vcpu);
 
-		if (signal_pending(current)) {
+		if (should_exit_user(vcpu)) {
 			r = -EINTR;
 			vcpu->run->exit_reason = KVM_EXIT_INTR;
 			++vcpu->stat.signal_exits;
@@ -7506,11 +7525,15 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 	} else
 		WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
 
-	if (kvm_run->immediate_exit)
+	if (kvm_run->immediate_exit) {
 		r = -EINTR;
-	else
+	} else {
 		r = vcpu_run(vcpu);
 
+		if (kvm_x86_ops->prepare_exit_user)
+			kvm_x86_ops->prepare_exit_user(vcpu);
+	}
+
 out:
 	kvm_put_guest_fpu(vcpu);
 	post_kvm_run_save(vcpu);