diff mbox series

[v3,15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

Message ID 1568401242-260374-16-git-send-email-suravee.suthikulpanit@amd.com (mailing list archive)
State New, archived
Headers show
Series kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode | expand

Commit Message

Suthikulpanit, Suravee Sept. 13, 2019, 7:01 p.m. UTC
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.

Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/ioapic.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

Paolo Bonzini Oct. 9, 2019, 9:21 a.m. UTC | #1
On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
>  	/*
> +	 * In case APICv accelerate EOI write and do not trap,
> +	 * in-kernel IOAPIC will not be able to receive the EOI.
> +	 * In this case, we do lazy update of the pending EOI when
> +	 * trying to set IOAPIC irq.
> +	 */
> +	if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
> +		ioapic_lazy_update_eoi(ioapic, irq);
> +

This is okay for the RTC, and in fact I suggest that you make it work
like this even for Intel.  This will get rid of kvm_apicv_eoi_accelerate
and be nicer overall.

However, it cannot work for the in-kernel PIT, because it is currently
checking ps->irq_ack before kvm_set_irq.  Unfortunately, the in-kernel
PIT is relying on the ack notifier to timely queue the pt->worker work
item and reinject the missed tick.

Thus, you cannot enable APICv if ps->reinject is true.

Perhaps you can make kvm->arch.apicv_state a disabled counter?  Then
Hyper-V can increment it when enabled, PIT can increment it when
ps->reinject becomes true and decrement it when it becomes false;
finally, svm.c can increment it when an SVM guest is launched and
increment/decrement it around ExtINT handling?

(This conflicts with some of the suggestions I made earlier, which
implied the existence of apicv_state, but everything should if anything
become easier).

Paolo
Roman Kagan Oct. 9, 2019, 9:55 a.m. UTC | #2
On Wed, Oct 09, 2019 at 11:21:41AM +0200, Paolo Bonzini wrote:
> On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
> >  	/*
> > +	 * In case APICv accelerate EOI write and do not trap,
> > +	 * in-kernel IOAPIC will not be able to receive the EOI.
> > +	 * In this case, we do lazy update of the pending EOI when
> > +	 * trying to set IOAPIC irq.
> > +	 */
> > +	if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
> > +		ioapic_lazy_update_eoi(ioapic, irq);
> > +
> 
> This is okay for the RTC, and in fact I suggest that you make it work
> like this even for Intel.  This will get rid of kvm_apicv_eoi_accelerate
> and be nicer overall.
> 
> However, it cannot work for the in-kernel PIT, because it is currently
> checking ps->irq_ack before kvm_set_irq.  Unfortunately, the in-kernel
> PIT is relying on the ack notifier to timely queue the pt->worker work
> item and reinject the missed tick.
> 
> Thus, you cannot enable APICv if ps->reinject is true.
> 
> Perhaps you can make kvm->arch.apicv_state a disabled counter?  Then
> Hyper-V can increment it when enabled, PIT can increment it when
> ps->reinject becomes true and decrement it when it becomes false;
> finally, svm.c can increment it when an SVM guest is launched and
> increment/decrement it around ExtINT handling?

This can benefit Hyper-V emulation too.  The point is that it's only
AutoEOI feature in SynIC that is incompatible with APICv.  So the VM can
use APICv until the guest configures its first AutoEOI SINT.  If the
hypervisor sets HV_DEPRECATING_AEOI_RECOMMENDED (bit 9) in
HYPERV_CPUID_ENLIGHTMENT_INFO (0x40000004) cpuid this may never happen
so we will not be pessimizing guests on modern hardware by merely
enabling SynIC.  I started looking into this recently and would be happy
to piggy-back on this series.

Roman.

> (This conflicts with some of the suggestions I made earlier, which
> implied the existence of apicv_state, but everything should if anything
> become easier).
> 
> Paolo
Suthikulpanit, Suravee Oct. 31, 2019, 3:17 p.m. UTC | #3
Paolo,

On 10/9/19 4:21 AM, Paolo Bonzini wrote:
> On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
>>   	/*
>> +	 * In case APICv accelerate EOI write and do not trap,
>> +	 * in-kernel IOAPIC will not be able to receive the EOI.
>> +	 * In this case, we do lazy update of the pending EOI when
>> +	 * trying to set IOAPIC irq.
>> +	 */
>> +	if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
>> +		ioapic_lazy_update_eoi(ioapic, irq);
>> +
> 
> This is okay for the RTC, and in fact I suggest that you make it work
> like this even for Intel.  This will get rid of kvm_apicv_eoi_accelerate
> and be nicer overall.
> 
> However, it cannot work for the in-kernel PIT, because it is currently
> checking ps->irq_ack before kvm_set_irq.  Unfortunately, the in-kernel
> PIT is relying on the ack notifier to timely queue the pt->worker work
> item and reinject the missed tick.
> 
> Thus, you cannot enable APICv if ps->reinject is true.
> 
> Perhaps you can make kvm->arch.apicv_state a disabled counter?  Then
> Hyper-V can increment it when enabled, PIT can increment it when
> ps->reinject becomes true and decrement it when it becomes false;
> finally, svm.c can increment it when an SVM guest is launched and
> increment/decrement it around ExtINT handling?

I have been looking into the disabled counter idea and found a couple 
issues:

* I am seeing more calls to enable_irq_window() than the number of 
interrupt_window_interception(). This results in imbalanced 
increment/decrement of the counter.

* APICv can be deactivated due to several reasons. Currently, it is 
difficult to figure out why, and this makes debugging APICv difficult.

What if we change kvm->arch.apicv_state to kvm->arch.apicv_disable_mask 
and have each bit representing the reason for deactivating APICv.

For example:
     #define APICV_DISABLE_MASK_IRQWIN        0
     #define APICV_DISABLE_MASK_HYPERV        1
     #define APICV_DISABLE_MASK_PIT_REINJ     2
     #define APICV_DISABLE_MASK_NESTED        3

In this case, we activate APICv only if kvm->arch.apicv_disable_mask == 
0. This way, we can find out why APICv is deactivated on a particular VM 
at a particular point in time.

Thanks,
Suravee
Paolo Bonzini Oct. 31, 2019, 11:09 p.m. UTC | #4
On 31/10/19 16:17, Suthikulpanit, Suravee wrote:
> What if we change kvm->arch.apicv_state to kvm->arch.apicv_disable_mask 
> and have each bit representing the reason for deactivating APICv.
> 
> For example:
>      #define APICV_DISABLE_MASK_IRQWIN        0
>      #define APICV_DISABLE_MASK_HYPERV        1
>      #define APICV_DISABLE_MASK_PIT_REINJ     2
>      #define APICV_DISABLE_MASK_NESTED        3
> 
> In this case, we activate APICv only if kvm->arch.apicv_disable_mask == 
> 0. This way, we can find out why APICv is deactivated on a particular VM 
> at a particular point in time.

Yes, that also works for me if it makes for an easier implementation.

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index c57b7bb..ddb41ee 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -48,6 +48,11 @@ 
 static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
 		bool line_status);
 
+static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
+				      struct kvm_ioapic *ioapic,
+				      int trigger_mode,
+				      int pin);
+
 static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
 					  unsigned long addr,
 					  unsigned long length)
@@ -174,6 +179,31 @@  static bool rtc_irq_check_coalesced(struct kvm_ioapic *ioapic)
 	return false;
 }
 
+static void ioapic_lazy_update_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
+
+	kvm_for_each_vcpu(i, vcpu, ioapic->kvm) {
+		if (!kvm_apic_match_dest(vcpu, NULL, KVM_APIC_DEST_NOSHORT,
+					 entry->fields.dest_id,
+					 entry->fields.dest_mode) ||
+		    kvm_apic_pending_eoi(vcpu, entry->fields.vector))
+			continue;
+
+		/*
+		 * If no longer has pending EOI in LAPICs, update
+		 * EOI for this vetor.
+		 */
+		rtc_irq_eoi(ioapic, vcpu, entry->fields.vector);
+		kvm_ioapic_update_eoi_one(vcpu, ioapic,
+					  entry->fields.trig_mode,
+					  irq);
+		break;
+	}
+}
+
 static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
 		int irq_level, bool line_status)
 {
@@ -192,6 +222,15 @@  static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
 	}
 
 	/*
+	 * In case APICv accelerate EOI write and do not trap,
+	 * in-kernel IOAPIC will not be able to receive the EOI.
+	 * In this case, we do lazy update of the pending EOI when
+	 * trying to set IOAPIC irq.
+	 */
+	if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
+		ioapic_lazy_update_eoi(ioapic, irq);
+
+	/*
 	 * Return 0 for coalesced interrupts; for edge-triggered interrupts,
 	 * this only happens if a previous edge has not been delivered due
 	 * do masking.  For level interrupts, the remote_irr field tells