diff mbox series

[v2,3/5] KVM: irqfd: Postpone resamplefd notify for oneshot interrupts

Message ID 20220805193919.1470653-4-dmy@semihalf.com (mailing list archive)
State New, archived
Headers show
Series KVM: Fix oneshot interrupts forwarding | expand

Commit Message

Dmytro Maluka Aug. 5, 2022, 7:39 p.m. UTC
The existing KVM mechanism for forwarding of level-triggered interrupts
using resample eventfd doesn't work quite correctly in the case of
interrupts that are handled in a Linux guest as oneshot interrupts
(IRQF_ONESHOT). Such an interrupt is acked to the device in its
threaded irq handler, i.e. later than it is acked to the interrupt
controller (EOI at the end of hardirq), not earlier.

Linux keeps such interrupt masked until its threaded handler finishes,
to prevent the EOI from re-asserting an unacknowledged interrupt.
However, with KVM + vfio (or whatever is listening on the resamplefd)
we don't check that the interrupt is still masked in the guest at the
moment of EOI. Resamplefd is notified regardless, so vfio prematurely
unmasks the host physical IRQ, thus a new (unwanted) physical interrupt
is generated in the host and queued for injection to the guest.

The fact that the virtual IRQ is still masked doesn't prevent this new
physical IRQ from being propagated to the guest, because:

1. It is not guaranteed that the vIRQ will remain masked by the time
   when vfio signals the trigger eventfd.
2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual
   IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this
   new pending interrupt is injected by KVM to the guest anyway.

There are observed at least 2 user-visible issues caused by those
extra erroneous pending interrupts for oneshot irq in the guest:

1. System suspend aborted due to a pending wakeup interrupt from
   ChromeOS EC (drivers/platform/chrome/cros_ec.c).
2. Annoying "invalid report id data" errors from ELAN0000 touchpad
   (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
   every time the touchpad is touched.

This patch fixes the issue on x86 by checking if the interrupt is
unmasked when we receive irq ack (EOI) and, in case if it's masked,
postponing resamplefd notify until the guest unmasks it.

It doesn't fix the issue for other archs yet, since it relies on KVM
irq mask notifiers functionality which currently works only on x86.
On other archs we can register mask notifiers but they are never called.
So on other archs resampler->masked is always false, so the behavior is
the same as before this patch.

Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
---
 include/linux/kvm_irqfd.h | 14 ++++++++++
 virt/kvm/eventfd.c        | 56 +++++++++++++++++++++++++++++++++++----
 2 files changed, 65 insertions(+), 5 deletions(-)

Comments

Eric Auger Aug. 9, 2022, 8:45 p.m. UTC | #1
Hi Dmytro,

On 8/5/22 21:39, Dmytro Maluka wrote:
> The existing KVM mechanism for forwarding of level-triggered interrupts
> using resample eventfd doesn't work quite correctly in the case of
> interrupts that are handled in a Linux guest as oneshot interrupts
> (IRQF_ONESHOT). Such an interrupt is acked to the device in its
> threaded irq handler, i.e. later than it is acked to the interrupt
> controller (EOI at the end of hardirq), not earlier.
>
> Linux keeps such interrupt masked until its threaded handler finishes,
> to prevent the EOI from re-asserting an unacknowledged interrupt.
> However, with KVM + vfio (or whatever is listening on the resamplefd)
> we don't check that the interrupt is still masked in the guest at the
> moment of EOI. Resamplefd is notified regardless, so vfio prematurely
> unmasks the host physical IRQ, thus a new (unwanted) physical interrupt
> is generated in the host and queued for injection to the guest.
>
> The fact that the virtual IRQ is still masked doesn't prevent this new
> physical IRQ from being propagated to the guest, because:
>
> 1. It is not guaranteed that the vIRQ will remain masked by the time
>    when vfio signals the trigger eventfd.
> 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual
>    IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this
>    new pending interrupt is injected by KVM to the guest anyway.
>
> There are observed at least 2 user-visible issues caused by those
> extra erroneous pending interrupts for oneshot irq in the guest:
>
> 1. System suspend aborted due to a pending wakeup interrupt from
>    ChromeOS EC (drivers/platform/chrome/cros_ec.c).
> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
>    (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
>    every time the touchpad is touched.
>
> This patch fixes the issue on x86 by checking if the interrupt is
> unmasked when we receive irq ack (EOI) and, in case if it's masked,
> postponing resamplefd notify until the guest unmasks it.
>
> It doesn't fix the issue for other archs yet, since it relies on KVM
> irq mask notifiers functionality which currently works only on x86.
> On other archs we can register mask notifiers but they are never called.
> So on other archs resampler->masked is always false, so the behavior is
> the same as before this patch.
>
> Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
> ---
>  include/linux/kvm_irqfd.h | 14 ++++++++++
>  virt/kvm/eventfd.c        | 56 +++++++++++++++++++++++++++++++++++----
>  2 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
> index dac047abdba7..01754a1abb9e 100644
> --- a/include/linux/kvm_irqfd.h
> +++ b/include/linux/kvm_irqfd.h
> @@ -19,6 +19,16 @@
>   * resamplefd.  All resamplers on the same gsi are de-asserted
>   * together, so we don't need to track the state of each individual
>   * user.  We can also therefore share the same irq source ID.
> + *
> + * A special case is when the interrupt is still masked at the moment
> + * an irq ack is received. That likely means that the interrupt has
> + * been acknowledged to the interrupt controller but not acknowledged
> + * to the device yet, e.g. it might be a Linux guest's threaded
> + * oneshot interrupt (IRQF_ONESHOT). In this case notifying through
> + * resamplefd is postponed until the guest unmasks the interrupt,
> + * which is detected through the irq mask notifier. This prevents
> + * erroneous extra interrupts caused by premature re-assert of an
> + * unacknowledged interrupt by the resamplefd listener.
>   */
>  struct kvm_kernel_irqfd_resampler {
>  	struct kvm *kvm;
> @@ -28,6 +38,10 @@ struct kvm_kernel_irqfd_resampler {
>  	 */
>  	struct list_head list;
>  	struct kvm_irq_ack_notifier notifier;
> +	struct kvm_irq_mask_notifier mask_notifier;
> +	bool masked;
> +	bool pending;
> +	spinlock_t lock;
>  	/*
>  	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
>  	 * resamplers among irqfds on the same gsi.
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 3007d956b626..f98dcce3959c 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -67,6 +67,7 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  	struct kvm *kvm;
>  	struct kvm_kernel_irqfd *irqfd;
>  	int idx;
> +	bool notify = true;
>  
>  	resampler = container_of(kian,
>  			struct kvm_kernel_irqfd_resampler, notifier);
> @@ -75,13 +76,52 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>  		    resampler->notifier.gsi, 0, false);
>  
> -	idx = srcu_read_lock(&kvm->irq_srcu);
> +	spin_lock(&resampler->lock);
> +	if (resampler->masked) {
> +		notify = false;
> +		resampler->pending = true;
> +	}
> +	spin_unlock(&resampler->lock);
> +
> +	if (notify) {
> +		idx = srcu_read_lock(&kvm->irq_srcu);
>  
> -	list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> -	    srcu_read_lock_held(&kvm->irq_srcu))
> -		eventfd_signal(irqfd->resamplefd, 1);
> +		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> +		    srcu_read_lock_held(&kvm->irq_srcu))
> +			eventfd_signal(irqfd->resamplefd, 1);
nit: you may introduce a helper for above code as the code is duplicated.
>  
> -	srcu_read_unlock(&kvm->irq_srcu, idx);
> +		srcu_read_unlock(&kvm->irq_srcu, idx);
> +	}
> +}
> +
> +static void irqfd_resampler_mask_notify(struct kvm_irq_mask_notifier *kimn,
> +					bool masked)
> +{
> +	struct kvm_kernel_irqfd_resampler *resampler;
> +	struct kvm *kvm;
> +	struct kvm_kernel_irqfd *irqfd;
> +	int idx;
> +	bool notify;
> +
> +	resampler = container_of(kimn,
> +			struct kvm_kernel_irqfd_resampler, mask_notifier);
> +	kvm = resampler->kvm;
> +
> +	spin_lock(&resampler->lock);
> +	notify = !masked && resampler->pending;
> +	resampler->masked = masked;
> +	resampler->pending = false;
> +	spin_unlock(&resampler->lock);
> +
> +	if (notify) {
> +		idx = srcu_read_lock(&kvm->irq_srcu);
> +
> +		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> +		    srcu_read_lock_held(&kvm->irq_srcu))
> +			eventfd_signal(irqfd->resamplefd, 1);
> +
> +		srcu_read_unlock(&kvm->irq_srcu, idx);
> +	}
>  }
>  
>  static void
> @@ -98,6 +138,8 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
>  	if (list_empty(&resampler->list)) {
>  		list_del(&resampler->link);
>  		kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier);
> +		kvm_unregister_irq_mask_notifier(kvm, resampler->mask_notifier.irq,
> +						 &resampler->mask_notifier);
>  		kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>  			    resampler->notifier.gsi, 0, false);
>  		kfree(resampler);
> @@ -367,9 +409,13 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  			INIT_LIST_HEAD(&resampler->list);
>  			resampler->notifier.gsi = irqfd->gsi;
>  			resampler->notifier.irq_acked = irqfd_resampler_ack;
> +			resampler->mask_notifier.func = irqfd_resampler_mask_notify;
> +			spin_lock_init(&resampler->lock);
>  			INIT_LIST_HEAD(&resampler->link);
>  
>  			list_add(&resampler->link, &kvm->irqfds.resampler_list);
> +			kvm_register_and_fire_irq_mask_notifier(kvm, irqfd->gsi,
> +								&resampler->mask_notifier);
>  			kvm_register_irq_ack_notifier(kvm,
>  						      &resampler->notifier);
>  			irqfd->resampler = resampler;
Adding Marc in CC

Thanks

Eric
Dmytro Maluka Aug. 9, 2022, 11:57 p.m. UTC | #2
On 8/9/22 10:45 PM, Eric Auger wrote:
> Hi Dmytro,
> 
> On 8/5/22 21:39, Dmytro Maluka wrote:
>> The existing KVM mechanism for forwarding of level-triggered interrupts
>> using resample eventfd doesn't work quite correctly in the case of
>> interrupts that are handled in a Linux guest as oneshot interrupts
>> (IRQF_ONESHOT). Such an interrupt is acked to the device in its
>> threaded irq handler, i.e. later than it is acked to the interrupt
>> controller (EOI at the end of hardirq), not earlier.
>>
>> Linux keeps such interrupt masked until its threaded handler finishes,
>> to prevent the EOI from re-asserting an unacknowledged interrupt.
>> However, with KVM + vfio (or whatever is listening on the resamplefd)
>> we don't check that the interrupt is still masked in the guest at the
>> moment of EOI. Resamplefd is notified regardless, so vfio prematurely
>> unmasks the host physical IRQ, thus a new (unwanted) physical interrupt
>> is generated in the host and queued for injection to the guest.
>>
>> The fact that the virtual IRQ is still masked doesn't prevent this new
>> physical IRQ from being propagated to the guest, because:
>>
>> 1. It is not guaranteed that the vIRQ will remain masked by the time
>>    when vfio signals the trigger eventfd.
>> 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual
>>    IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this
>>    new pending interrupt is injected by KVM to the guest anyway.
>>
>> There are observed at least 2 user-visible issues caused by those
>> extra erroneous pending interrupts for oneshot irq in the guest:
>>
>> 1. System suspend aborted due to a pending wakeup interrupt from
>>    ChromeOS EC (drivers/platform/chrome/cros_ec.c).
>> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
>>    (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
>>    every time the touchpad is touched.
>>
>> This patch fixes the issue on x86 by checking if the interrupt is
>> unmasked when we receive irq ack (EOI) and, in case if it's masked,
>> postponing resamplefd notify until the guest unmasks it.
>>
>> It doesn't fix the issue for other archs yet, since it relies on KVM
>> irq mask notifiers functionality which currently works only on x86.
>> On other archs we can register mask notifiers but they are never called.
>> So on other archs resampler->masked is always false, so the behavior is
>> the same as before this patch.
>>
>> Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
>> ---
>>  include/linux/kvm_irqfd.h | 14 ++++++++++
>>  virt/kvm/eventfd.c        | 56 +++++++++++++++++++++++++++++++++++----
>>  2 files changed, 65 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
>> index dac047abdba7..01754a1abb9e 100644
>> --- a/include/linux/kvm_irqfd.h
>> +++ b/include/linux/kvm_irqfd.h
>> @@ -19,6 +19,16 @@
>>   * resamplefd.  All resamplers on the same gsi are de-asserted
>>   * together, so we don't need to track the state of each individual
>>   * user.  We can also therefore share the same irq source ID.
>> + *
>> + * A special case is when the interrupt is still masked at the moment
>> + * an irq ack is received. That likely means that the interrupt has
>> + * been acknowledged to the interrupt controller but not acknowledged
>> + * to the device yet, e.g. it might be a Linux guest's threaded
>> + * oneshot interrupt (IRQF_ONESHOT). In this case notifying through
>> + * resamplefd is postponed until the guest unmasks the interrupt,
>> + * which is detected through the irq mask notifier. This prevents
>> + * erroneous extra interrupts caused by premature re-assert of an
>> + * unacknowledged interrupt by the resamplefd listener.
>>   */
>>  struct kvm_kernel_irqfd_resampler {
>>  	struct kvm *kvm;
>> @@ -28,6 +38,10 @@ struct kvm_kernel_irqfd_resampler {
>>  	 */
>>  	struct list_head list;
>>  	struct kvm_irq_ack_notifier notifier;
>> +	struct kvm_irq_mask_notifier mask_notifier;
>> +	bool masked;
>> +	bool pending;
>> +	spinlock_t lock;
>>  	/*
>>  	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
>>  	 * resamplers among irqfds on the same gsi.
>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>> index 3007d956b626..f98dcce3959c 100644
>> --- a/virt/kvm/eventfd.c
>> +++ b/virt/kvm/eventfd.c
>> @@ -67,6 +67,7 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>>  	struct kvm *kvm;
>>  	struct kvm_kernel_irqfd *irqfd;
>>  	int idx;
>> +	bool notify = true;
>>  
>>  	resampler = container_of(kian,
>>  			struct kvm_kernel_irqfd_resampler, notifier);
>> @@ -75,13 +76,52 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>>  	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>>  		    resampler->notifier.gsi, 0, false);
>>  
>> -	idx = srcu_read_lock(&kvm->irq_srcu);
>> +	spin_lock(&resampler->lock);
>> +	if (resampler->masked) {
>> +		notify = false;
>> +		resampler->pending = true;
>> +	}
>> +	spin_unlock(&resampler->lock);
>> +
>> +	if (notify) {
>> +		idx = srcu_read_lock(&kvm->irq_srcu);
>>  
>> -	list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
>> -	    srcu_read_lock_held(&kvm->irq_srcu))
>> -		eventfd_signal(irqfd->resamplefd, 1);
>> +		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
>> +		    srcu_read_lock_held(&kvm->irq_srcu))
>> +			eventfd_signal(irqfd->resamplefd, 1);
> nit: you may introduce a helper for above code as the code is duplicated.

Ack.

>>  
>> -	srcu_read_unlock(&kvm->irq_srcu, idx);
>> +		srcu_read_unlock(&kvm->irq_srcu, idx);
>> +	}
>> +}
>> +
>> +static void irqfd_resampler_mask_notify(struct kvm_irq_mask_notifier *kimn,
>> +					bool masked)
>> +{
>> +	struct kvm_kernel_irqfd_resampler *resampler;
>> +	struct kvm *kvm;
>> +	struct kvm_kernel_irqfd *irqfd;
>> +	int idx;
>> +	bool notify;
>> +
>> +	resampler = container_of(kimn,
>> +			struct kvm_kernel_irqfd_resampler, mask_notifier);
>> +	kvm = resampler->kvm;
>> +
>> +	spin_lock(&resampler->lock);
>> +	notify = !masked && resampler->pending;
>> +	resampler->masked = masked;
>> +	resampler->pending = false;
>> +	spin_unlock(&resampler->lock);
>> +
>> +	if (notify) {
>> +		idx = srcu_read_lock(&kvm->irq_srcu);
>> +
>> +		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
>> +		    srcu_read_lock_held(&kvm->irq_srcu))
>> +			eventfd_signal(irqfd->resamplefd, 1);
>> +
>> +		srcu_read_unlock(&kvm->irq_srcu, idx);
>> +	}
>>  }
>>  
>>  static void
>> @@ -98,6 +138,8 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
>>  	if (list_empty(&resampler->list)) {
>>  		list_del(&resampler->link);
>>  		kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier);
>> +		kvm_unregister_irq_mask_notifier(kvm, resampler->mask_notifier.irq,
>> +						 &resampler->mask_notifier);
>>  		kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>>  			    resampler->notifier.gsi, 0, false);
>>  		kfree(resampler);
>> @@ -367,9 +409,13 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>>  			INIT_LIST_HEAD(&resampler->list);
>>  			resampler->notifier.gsi = irqfd->gsi;
>>  			resampler->notifier.irq_acked = irqfd_resampler_ack;
>> +			resampler->mask_notifier.func = irqfd_resampler_mask_notify;
>> +			spin_lock_init(&resampler->lock);
>>  			INIT_LIST_HEAD(&resampler->link);
>>  
>>  			list_add(&resampler->link, &kvm->irqfds.resampler_list);
>> +			kvm_register_and_fire_irq_mask_notifier(kvm, irqfd->gsi,
>> +								&resampler->mask_notifier);
>>  			kvm_register_irq_ack_notifier(kvm,
>>  						      &resampler->notifier);
>>  			irqfd->resampler = resampler;
> Adding Marc in CC
> 
> Thanks
> 
> Eric
>
Marc Zyngier Aug. 10, 2022, 8:41 a.m. UTC | #3
On Tue, 09 Aug 2022 21:45:25 +0100,
Eric Auger <eric.auger@redhat.com> wrote:
> 
> Hi Dmytro,
> 
> On 8/5/22 21:39, Dmytro Maluka wrote:
> > The existing KVM mechanism for forwarding of level-triggered interrupts
> > using resample eventfd doesn't work quite correctly in the case of
> > interrupts that are handled in a Linux guest as oneshot interrupts
> > (IRQF_ONESHOT). Such an interrupt is acked to the device in its
> > threaded irq handler, i.e. later than it is acked to the interrupt
> > controller (EOI at the end of hardirq), not earlier.
> >
> > Linux keeps such interrupt masked until its threaded handler finishes,
> > to prevent the EOI from re-asserting an unacknowledged interrupt.
> > However, with KVM + vfio (or whatever is listening on the resamplefd)
> > we don't check that the interrupt is still masked in the guest at the
> > moment of EOI. Resamplefd is notified regardless, so vfio prematurely
> > unmasks the host physical IRQ, thus a new (unwanted) physical interrupt
> > is generated in the host and queued for injection to the guest.
> >
> > The fact that the virtual IRQ is still masked doesn't prevent this new
> > physical IRQ from being propagated to the guest, because:
> >
> > 1. It is not guaranteed that the vIRQ will remain masked by the time
> >    when vfio signals the trigger eventfd.
> > 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual
> >    IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this
> >    new pending interrupt is injected by KVM to the guest anyway.
> >
> > There are observed at least 2 user-visible issues caused by those
> > extra erroneous pending interrupts for oneshot irq in the guest:
> >
> > 1. System suspend aborted due to a pending wakeup interrupt from
> >    ChromeOS EC (drivers/platform/chrome/cros_ec.c).
> > 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
> >    (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
> >    every time the touchpad is touched.
> >
> > This patch fixes the issue on x86 by checking if the interrupt is
> > unmasked when we receive irq ack (EOI) and, in case if it's masked,
> > postponing resamplefd notify until the guest unmasks it.
> >
> > It doesn't fix the issue for other archs yet, since it relies on KVM
> > irq mask notifiers functionality which currently works only on x86.
> > On other archs we can register mask notifiers but they are never called.
> > So on other archs resampler->masked is always false, so the behavior is
> > the same as before this patch.

The core issue seems that you would like to be able to retire a
interrupt from what has been queued into the guest by a previous
resampling (because the line has effectively dropped in the meantime).

On arm64, it would be easy enough to sample the pending state of the
physical line and adjust the state of the virtual interrupt
accordingly. This would at least have the advantage of preserving the
illusion of an interrupt being directly routed to the guest and its
pending state being preserved between EOI and unmask.

It isn't perfect either though as, assuming the guest can ack the
interrupt on the device without exiting, the line would still appear
as pending until the next exit, possibly the unmask.

	M.
diff mbox series

Patch

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dac047abdba7..01754a1abb9e 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -19,6 +19,16 @@ 
  * resamplefd.  All resamplers on the same gsi are de-asserted
  * together, so we don't need to track the state of each individual
  * user.  We can also therefore share the same irq source ID.
+ *
+ * A special case is when the interrupt is still masked at the moment
+ * an irq ack is received. That likely means that the interrupt has
+ * been acknowledged to the interrupt controller but not acknowledged
+ * to the device yet, e.g. it might be a Linux guest's threaded
+ * oneshot interrupt (IRQF_ONESHOT). In this case notifying through
+ * resamplefd is postponed until the guest unmasks the interrupt,
+ * which is detected through the irq mask notifier. This prevents
+ * erroneous extra interrupts caused by premature re-assert of an
+ * unacknowledged interrupt by the resamplefd listener.
  */
 struct kvm_kernel_irqfd_resampler {
 	struct kvm *kvm;
@@ -28,6 +38,10 @@  struct kvm_kernel_irqfd_resampler {
 	 */
 	struct list_head list;
 	struct kvm_irq_ack_notifier notifier;
+	struct kvm_irq_mask_notifier mask_notifier;
+	bool masked;
+	bool pending;
+	spinlock_t lock;
 	/*
 	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
 	 * resamplers among irqfds on the same gsi.
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 3007d956b626..f98dcce3959c 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -67,6 +67,7 @@  irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
 	struct kvm *kvm;
 	struct kvm_kernel_irqfd *irqfd;
 	int idx;
+	bool notify = true;
 
 	resampler = container_of(kian,
 			struct kvm_kernel_irqfd_resampler, notifier);
@@ -75,13 +76,52 @@  irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
 	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
 		    resampler->notifier.gsi, 0, false);
 
-	idx = srcu_read_lock(&kvm->irq_srcu);
+	spin_lock(&resampler->lock);
+	if (resampler->masked) {
+		notify = false;
+		resampler->pending = true;
+	}
+	spin_unlock(&resampler->lock);
+
+	if (notify) {
+		idx = srcu_read_lock(&kvm->irq_srcu);
 
-	list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
-	    srcu_read_lock_held(&kvm->irq_srcu))
-		eventfd_signal(irqfd->resamplefd, 1);
+		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
+		    srcu_read_lock_held(&kvm->irq_srcu))
+			eventfd_signal(irqfd->resamplefd, 1);
 
-	srcu_read_unlock(&kvm->irq_srcu, idx);
+		srcu_read_unlock(&kvm->irq_srcu, idx);
+	}
+}
+
+static void irqfd_resampler_mask_notify(struct kvm_irq_mask_notifier *kimn,
+					bool masked)
+{
+	struct kvm_kernel_irqfd_resampler *resampler;
+	struct kvm *kvm;
+	struct kvm_kernel_irqfd *irqfd;
+	int idx;
+	bool notify;
+
+	resampler = container_of(kimn,
+			struct kvm_kernel_irqfd_resampler, mask_notifier);
+	kvm = resampler->kvm;
+
+	spin_lock(&resampler->lock);
+	notify = !masked && resampler->pending;
+	resampler->masked = masked;
+	resampler->pending = false;
+	spin_unlock(&resampler->lock);
+
+	if (notify) {
+		idx = srcu_read_lock(&kvm->irq_srcu);
+
+		list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
+		    srcu_read_lock_held(&kvm->irq_srcu))
+			eventfd_signal(irqfd->resamplefd, 1);
+
+		srcu_read_unlock(&kvm->irq_srcu, idx);
+	}
 }
 
 static void
@@ -98,6 +138,8 @@  irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
 	if (list_empty(&resampler->list)) {
 		list_del(&resampler->link);
 		kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier);
+		kvm_unregister_irq_mask_notifier(kvm, resampler->mask_notifier.irq,
+						 &resampler->mask_notifier);
 		kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
 			    resampler->notifier.gsi, 0, false);
 		kfree(resampler);
@@ -367,9 +409,13 @@  kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 			INIT_LIST_HEAD(&resampler->list);
 			resampler->notifier.gsi = irqfd->gsi;
 			resampler->notifier.irq_acked = irqfd_resampler_ack;
+			resampler->mask_notifier.func = irqfd_resampler_mask_notify;
+			spin_lock_init(&resampler->lock);
 			INIT_LIST_HEAD(&resampler->link);
 
 			list_add(&resampler->link, &kvm->irqfds.resampler_list);
+			kvm_register_and_fire_irq_mask_notifier(kvm, irqfd->gsi,
+								&resampler->mask_notifier);
 			kvm_register_irq_ack_notifier(kvm,
 						      &resampler->notifier);
 			irqfd->resampler = resampler;