diff mbox

KVM: x86: x2apic broadcast with physical mode does not work

Message ID 1412014558-16970-1-git-send-email-namit@cs.technion.ac.il (mailing list archive)
State New, archived
Headers show

Commit Message

Nadav Amit Sept. 29, 2014, 6:15 p.m. UTC
KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
(10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
physical destination modes."

The fix tries to combine broadcast in different modes.  Broadcast can be done
in the fast-path if the delivery-mode is logical and there is only a single
cluster.  Otherwise, do it slowly.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/lapic.c            | 22 ++++++++++++++++------
 arch/x86/kvm/lapic.h            |  4 ++--
 virt/kvm/ioapic.h               |  2 +-
 4 files changed, 20 insertions(+), 10 deletions(-)

Comments

Radim Krčmář Oct. 1, 2014, 12:09 a.m. UTC | #1
2014-09-29 21:15+0300, Nadav Amit:
> KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
> (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
> FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
> physical destination modes."

Btw. have you hit it in the wild?  (It was there so long that I suppose
that all OSes use a shorthand ...)

> The fix tries to combine broadcast in different modes.  Broadcast can be done
> in the fast-path if the delivery-mode is logical and there is only a single
> cluster.  Otherwise, do it slowly.

(Flat logical xapic or current logical x2apic; it doesn't have to do
 much with clusters from Intel's vocabulary ...)

> 
> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
> ---
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> @@ -558,17 +560,21 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
>  	apic_update_ppr(apic);
>  }
>  
> -int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)

(My kingdom for an explanation of how u16 got here.)

> +int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)

(bool would be better, but we should convert the whole stack, so it's a
 material for a new patch.)

>  {
> -	return dest == 0xff || kvm_apic_id(apic) == dest;
> +	u32 broadcast = apic_x2apic_mode(apic) ? (u32)-1 : 0xff;
> +
> +	return (dest == broadcast || kvm_apic_id(apic) == dest);

I'd introduce a helper for this, as it makes some sense to use it later

  static inline u32 kvm_apic_broadcast(struct kvm_lapic *apic)
  {
  	return apic_x2apic_mode(apic) ? (u32)-1 : 0xff;
  }

> -int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
> +int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)

(And a horse for reasons behind dest/mda schism.)

>  {
>  	int result = 0;
>  	u32 logical_id;
>  

>  	if (apic_x2apic_mode(apic)) {

No need to be x2apic specific, we could use kvm_apic_broadcast() to
"optimize" the xapic path as well.

xapic broadcast is 0xff in flat and cluster mode, so it would actually
be another bugfix -- we miss the latter one right now; not that it has
any users.

> +		if (mda == (u32)-1)
> +			return 1; /* x2apic broadcast */

(It would be nicer to use named symbol instead of a comment for
 explaining the code.)

> @@ -657,9 +663,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  	if (!map)
>  		goto out;
>  
> +	/* broadcast on physical or multiple clusters is done slowly */
> +	if (irq->dest_id == map->broadcast &&

(If we dropped the second && condition, broadcast check could be moved
 before we rcu_dereference the map.  Using kvm_apic_broadcast instead.)

> +	    (irq->dest_mode == 0 || map->cid_mask != 0))

It's a clever and pretty hacky use of cid_mask[1] ... I think it would
be better to have a broadcast fast-path for all modes, so we would do
something like this

  if (irq->dest == kvm_apic_broadcast(apic))
  	return HANDLE_BROADCAST(...);

in the end and having 'return 0' fallback for now is closer to our goal.

> +		goto out;


---
1: When we fix logical x2apic, it will have cid_mask > 0 and hopefully
   no-one uses it now, broken and capped at one 16 CPU cluster.
   That would leave us with flat logical mode that supports up to 8
   CPUs, which is a hard performance decision -- I think that most
   guests will use other modes and the universe will lose more cycles on
   checking for this than it would on broadcasting through slow path :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nadav Amit Oct. 1, 2014, 7:40 a.m. UTC | #2
On Oct 1, 2014, at 3:09 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2014-09-29 21:15+0300, Nadav Amit:
>> KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
>> (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
>> FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
>> physical destination modes."
> 
> Btw. have you hit it in the wild?  (It was there so long that I suppose
> that all OSes use a shorthand …)
Not in real OSes, but in some tests I run.

> 
>> The fix tries to combine broadcast in different modes.  Broadcast can be done
>> in the fast-path if the delivery-mode is logical and there is only a single
>> cluster.  Otherwise, do it slowly.
> 
> (Flat logical xapic or current logical x2apic; it doesn't have to do
> much with clusters from Intel's vocabulary …)
I meant if the ICR destination mode is logical (not the delivery-mode) and there is no more than one cluster (if clusters exist in this mode).
[It also applies to logical cluster-mode.] Does this phrasing makes sense to you?

> 
>> 
>> Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
>> ---
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> @@ -558,17 +560,21 @@ static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
>> 	apic_update_ppr(apic);
>> }
>> 
>> -int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
> 
> (My kingdom for an explanation of how u16 got here.)
> 
>> +int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
> 
> (bool would be better, but we should convert the whole stack, so it's a
> material for a new patch.)
> 
>> {
>> -	return dest == 0xff || kvm_apic_id(apic) == dest;
>> +	u32 broadcast = apic_x2apic_mode(apic) ? (u32)-1 : 0xff;
>> +
>> +	return (dest == broadcast || kvm_apic_id(apic) == dest);
> 
> I'd introduce a helper for this, as it makes some sense to use it later
> 
>  static inline u32 kvm_apic_broadcast(struct kvm_lapic *apic)
>  {
>  	return apic_x2apic_mode(apic) ? (u32)-1 : 0xff;
>  }
> 
>> -int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
>> +int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
> 
> (And a horse for reasons behind dest/mda schism.)
> 
>> {
>> 	int result = 0;
>> 	u32 logical_id;
>> 
> 
>> 	if (apic_x2apic_mode(apic)) {
> 
> No need to be x2apic specific, we could use kvm_apic_broadcast() to
> "optimize" the xapic path as well.
> 
> xapic broadcast is 0xff in flat and cluster mode, so it would actually
> be another bugfix -- we miss the latter one right now; not that it has
> any users.
I actually submitted a fix for cluster mode - http://www.spinics.net/lists/kvm/msg106351.html
I now see Paolo did not apply it or responded. I will combine the two patches to one patch-set for v2.

> 
>> +		if (mda == (u32)-1)
>> +			return 1; /* x2apic broadcast */
> 
> (It would be nicer to use named symbol instead of a comment for
> explaining the code.)
Agreed. I looked for an existing constant, found none, and did not want to touch apicdef.h.
I will do so in v2.

> 
>> @@ -657,9 +663,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>> 	if (!map)
>> 		goto out;
>> 
>> +	/* broadcast on physical or multiple clusters is done slowly */
>> +	if (irq->dest_id == map->broadcast &&
> 
> (If we dropped the second && condition, broadcast check could be moved
> before we rcu_dereference the map.  Using kvm_apic_broadcast instead.)
> 
>> +	    (irq->dest_mode == 0 || map->cid_mask != 0))
> 
> It's a clever and pretty hacky use of cid_mask[1] ... I think it would
> be better to have a broadcast fast-path for all modes, so we would do
> something like this
> 
>  if (irq->dest == kvm_apic_broadcast(apic))
>  	return HANDLE_BROADCAST(…);
> 
> in the end and having 'return 0' fallback for now is closer to our goal.
I agree that broadcast may be done more efficiently, but broadcast using shorthand is also done slowly today.
So I think it should go into different patch.

> 
>> +		goto out;
> 
> 
> ---
> 1: When we fix logical x2apic, it will have cid_mask > 0 and hopefully
>   no-one uses it now, broken and capped at one 16 CPU cluster.
>   That would leave us with flat logical mode that supports up to 8
>   CPUs, which is a hard performance decision -- I think that most
>   guests will use other modes and the universe will lose more cycles on
>   checking for this than it would on broadcasting through slow path :)


Thanks for the feedback.

I completely agree with your revisions, yet I think they are orthogonal and should go into different patches.
Since my current project focus on validation of hypervisors, I am not sure I will have the time to do it well soon.
Regardless, I am not sure whether the performance impact of taking the slow-path is significant.

Nadav


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Oct. 1, 2014, 1:35 p.m. UTC | #3
2014-10-01 10:40+0300, Nadav Amit:
> On Oct 1, 2014, at 3:09 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> > 2014-09-29 21:15+0300, Nadav Amit:
> >> KVM does not deliver x2APIC broadcast messages with physical mode.  Intel SDM
> >> (10.12.9 ICR Operation in x2APIC Mode) states: "A destination ID value of
> >> FFFF_FFFFH is used for broadcast of interrupts in both logical destination and
> >> physical destination modes."
> > 
> > Btw. have you hit it in the wild?  (It was there so long that I suppose
> > that all OSes use a shorthand …)
> Not in real OSes, but in some tests I run.

Great, thanks. (Would be a -stable material otherwise.)

> >> The fix tries to combine broadcast in different modes.  Broadcast can be done
> >> in the fast-path if the delivery-mode is logical and there is only a single
> >> cluster.  Otherwise, do it slowly.
> > 
> > (Flat logical xapic or current logical x2apic; it doesn't have to do
> > much with clusters from Intel's vocabulary …)
> I meant if the ICR destination mode is logical (not the delivery-mode) and there is no more than one cluster (if clusters exist in this mode).
> [It also applies to logical cluster-mode.] Does this phrasing makes sense to you?

'irq->dest_mode == 0' means that it is logical destination, but
'map->cid_mask != 0' is a variable that doesn't have a well defined
relationship to the number of clusters -- it just works for the two
cases right now.  (Flat logical xapic has 0 clusters :)

The original wording is better (xapic logical cluster has cid_mask=0xf),
and I was fine with it (just pointing out that it is confusing), which
is why I put my comment into parentheses.

It would be best to drop the condition and comment IMO.

> > xapic broadcast is 0xff in flat and cluster mode, so it would actually
> > be another bugfix -- we miss the latter one right now; not that it has
> > any users.
> I actually submitted a fix for cluster mode - http://www.spinics.net/lists/kvm/msg106351.html
> I now see Paolo did not apply it or responded. I will combine the two patches to one patch-set for v2.

Oh, I didn't see that, please do.

> >> @@ -657,9 +663,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> >> 	if (!map)
> >> 		goto out;
> >> 
> >> +	/* broadcast on physical or multiple clusters is done slowly */
> >> +	if (irq->dest_id == map->broadcast &&
> > 
> > (If we dropped the second && condition, broadcast check could be moved
> > before we rcu_dereference the map.  Using kvm_apic_broadcast instead.)
> > 
> >> +	    (irq->dest_mode == 0 || map->cid_mask != 0))
> > 
> > It's a clever and pretty hacky use of cid_mask[1] ... I think it would
> > be better to have a broadcast fast-path for all modes, so we would do
> > something like this
> > 
> >  if (irq->dest == kvm_apic_broadcast(apic))
> >  	return HANDLE_BROADCAST(…);
> > 
> > in the end and having 'return 0' fallback for now is closer to our goal.
> I agree that broadcast may be done more efficiently, but broadcast using shorthand is also done slowly today.

True, we would probably completely rewrite both variants.

> So I think it should go into different patch.

(The rework, definitely, but there isn't much difference between 'goto
 out' and 'return 0' after our broadcast check.)

> > 
> >> +		goto out;
> > 
> > 
> > ---
> > 1: When we fix logical x2apic, it will have cid_mask > 0 and hopefully
> >   no-one uses it now, broken and capped at one 16 CPU cluster.
> >   That would leave us with flat logical mode that supports up to 8
> >   CPUs, which is a hard performance decision -- I think that most
> >   guests will use other modes and the universe will lose more cycles on
> >   checking for this than it would on broadcasting through slow path :)
> 
> 
> Thanks for the feedback.
> 
> I completely agree with your revisions, yet I think they are orthogonal and should go into different patches.
> Since my current project focus on validation of hypervisors, I am not sure I will have the time to do it well soon.

Sure, what you are doing seems much more important.

> Regardless, I am not sure whether the performance impact of taking the slow-path is significant.

I agree, so there is little reason to do the extra check for performance
reasons.  The check makes the code worse, which makes performance gain
the only possible excuse for it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nadav Amit Oct. 1, 2014, 7:33 p.m. UTC | #4
On Oct 1, 2014, at 4:35 PM, Radim Kr?má? <rkrcmar@redhat.com> wrote:

> 2014-10-01 10:40+0300, Nadav Amit:
>> On Oct 1, 2014, at 3:09 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>>> 2014-09-29 21:15+0300, Nadav Amit:
>>>> @@ -657,9 +663,13 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>>>> 	if (!map)
>>>> 		goto out;
>>>> 
>>>> +	/* broadcast on physical or multiple clusters is done slowly */
>>>> +	if (irq->dest_id == map->broadcast &&
>>> 
>>> (If we dropped the second && condition, broadcast check could be moved
>>> before we rcu_dereference the map.  Using kvm_apic_broadcast instead.)
>>> 
>>>> +	    (irq->dest_mode == 0 || map->cid_mask != 0))
>>> 
>>> It's a clever and pretty hacky use of cid_mask[1] ... I think it would
>>> be better to have a broadcast fast-path for all modes, so we would do
>>> something like this
>>> 
>>> if (irq->dest == kvm_apic_broadcast(apic))
>>> 	return HANDLE_BROADCAST(…);
>>> 
>>> in the end and having 'return 0' fallback for now is closer to our goal.
>> I agree that broadcast may be done more efficiently, but broadcast using shorthand is also done slowly today.
> 
> True, we would probably completely rewrite both variants.
> 
>> So I think it should go into different patch.
> 
> (The rework, definitely, but there isn't much difference between 'goto
> out' and 'return 0' after our broadcast check.)
> 
>>> 
>>>> +		goto out;


I tried to implement your suggestion, but I don’t manage to make a nice and valid fix.
The problem is that to determine the broadcast mode in the way you suggested, you need to have the lapic struct of one of the vCPUs with enabled apic.
In kvm_irq_delivery_to_apic_fast, src may be NULL (if not issued by another CPU), or of a vCPU with disabled APIC. In such case I don’t know how to determine what is the broadcast message without looking at apic_map.
That is the reason I used the apic_map before. Since I used the it, I do not use return but “goto out”, so rcu_read_unlock would be called.

So unless you can suggest a better way, I would leave this part as is, and determine broadcast according to the new map->broadcast, as I did before.

Thanks,
Nadav
Radim Krčmář Oct. 1, 2014, 8:59 p.m. UTC | #5
2014-10-01 22:33+0300, Nadav Amit:
> I tried to implement your suggestion, but I don’t manage to make a nice and valid fix.
> The problem is that to determine the broadcast mode in the way you suggested, you need to have the lapic struct of one of the vCPUs with enabled apic.
> In kvm_irq_delivery_to_apic_fast, src may be NULL (if not issued by another CPU), or of a vCPU with disabled APIC. In such case I don’t know how to determine what is the broadcast message without looking at apic_map.

I see, sorry, apic_map is our best bet.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7d603a7..6f2be83 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -542,7 +542,7 @@  struct kvm_apic_map {
 	struct rcu_head rcu;
 	u8 ldr_bits;
 	/* fields bellow are used to decode ldr values in different modes */
-	u32 cid_shift, cid_mask, lid_mask;
+	u32 cid_shift, cid_mask, lid_mask, broadcast;
 	struct kvm_lapic *phys_map[256];
 	/* first index is cluster id second is cpu id in a cluster */
 	struct kvm_lapic *logical_map[16][16];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b8345dd..16404cd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -149,6 +149,7 @@  static void recalculate_apic_map(struct kvm *kvm)
 	new->cid_shift = 8;
 	new->cid_mask = 0;
 	new->lid_mask = 0xff;
+	new->broadcast = 0xff;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		struct kvm_lapic *apic = vcpu->arch.apic;
@@ -170,6 +171,7 @@  static void recalculate_apic_map(struct kvm *kvm)
 			new->cid_shift = 16;
 			new->cid_mask = (1 << KVM_X2APIC_CID_BITS) - 1;
 			new->lid_mask = 0xffff;
+			new->broadcast = (u32)-1;
 		} else if (kvm_apic_sw_enabled(apic) &&
 				!new->cid_mask /* flat mode */ &&
 				kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_CLUSTER) {
@@ -558,17 +560,21 @@  static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
 	apic_update_ppr(apic);
 }
 
-int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest)
+int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest)
 {
-	return dest == 0xff || kvm_apic_id(apic) == dest;
+	u32 broadcast = apic_x2apic_mode(apic) ? (u32)-1 : 0xff;
+
+	return (dest == broadcast || kvm_apic_id(apic) == dest);
 }
 
-int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
+int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda)
 {
 	int result = 0;
 	u32 logical_id;
 
 	if (apic_x2apic_mode(apic)) {
+		if (mda == (u32)-1)
+			return 1; /* x2apic broadcast */
 		logical_id = kvm_apic_get_reg(apic, APIC_LDR);
 		return logical_id & mda;
 	}
@@ -595,7 +601,7 @@  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda)
 }
 
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
-			   int short_hand, int dest, int dest_mode)
+			   int short_hand, unsigned int dest, int dest_mode)
 {
 	int result = 0;
 	struct kvm_lapic *target = vcpu->arch.apic;
@@ -657,9 +663,13 @@  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 	if (!map)
 		goto out;
 
+	/* broadcast on physical or multiple clusters is done slowly */
+	if (irq->dest_id == map->broadcast &&
+	    (irq->dest_mode == 0 || map->cid_mask != 0))
+		goto out;
+
 	if (irq->dest_mode == 0) { /* physical mode */
-		if (irq->delivery_mode == APIC_DM_LOWEST ||
-				irq->dest_id == 0xff)
+		if (irq->delivery_mode == APIC_DM_LOWEST)
 			goto out;
 		dst = &map->phys_map[irq->dest_id & 0xff];
 	} else {
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..3fd6c22 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -55,8 +55,8 @@  void kvm_apic_set_version(struct kvm_vcpu *vcpu);
 
 void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
-int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
-int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
+int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u32 dest);
+int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u32 mda);
 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
 		unsigned long *dest_map);
 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index e23b706..31725a3 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -83,7 +83,7 @@  static inline struct kvm_ioapic *ioapic_irqchip(struct kvm *kvm)
 
 void kvm_rtc_eoi_tracking_restore_one(struct kvm_vcpu *vcpu);
 int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
-		int short_hand, int dest, int dest_mode);
+		int short_hand, unsigned int dest, int dest_mode);
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
 void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
 			int trigger_mode);