diff mbox

[1/2] KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid

Message ID 20180307124055.13800-2-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier March 7, 2018, 12:40 p.m. UTC
The vgic code is trying to be clever when injecting GICv2 SGIs,
and will happily populate LRs with the same interrupt number if
they come from multiple vcpus (after all, they are distinct
interrupt sources).

Unfortunately, this is against the letter of the architecture,
and the GICv2 architecture spec says "Each valid interrupt stored
in the List registers must have a unique VirtualID for that
virtual CPU interface.". GICv3 has similar (although slightly
ambiguous) restrictions.

This results in guests locking up when using GICv2-on-GICv3, for
example. The obvious fix is to stop trying so hard, and inject
a single vcpu per SGI per guest entry. After all, pending SGIs
with multiple source vcpus are pretty rare, and are mostly seen
in scenario where the physical CPUs are severely overcomitted.

Cc: stable@vger.kernel.org
Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic/vgic.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

Comments

Andre Przywara March 7, 2018, 2:44 p.m. UTC | #1
Hi,

On 07/03/18 12:40, Marc Zyngier wrote:
> The vgic code is trying to be clever when injecting GICv2 SGIs,
> and will happily populate LRs with the same interrupt number if
> they come from multiple vcpus (after all, they are distinct
> interrupt sources).
> 
> Unfortunately, this is against the letter of the architecture,
> and the GICv2 architecture spec says "Each valid interrupt stored
> in the List registers must have a unique VirtualID for that
> virtual CPU interface.". GICv3 has similar (although slightly
> ambiguous) restrictions.

Ah, good catch. I was silently assuming that this "unique interrupt"
restriction was including the source ID, but fair enough.

> This results in guests locking up when using GICv2-on-GICv3, for
> example. The obvious fix is to stop trying so hard, and inject
> a single vcpu per SGI per guest entry. After all, pending SGIs
> with multiple source vcpus are pretty rare, and are mostly seen
> in scenario where the physical CPUs are severely overcomitted.
> 
> Cc: stable@vger.kernel.org
> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks!
Andre.

> ---
>  virt/kvm/arm/vgic/vgic.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index c7c5ef190afa..1f7ff175f47b 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>  	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>  		spin_lock(&irq->irq_lock);
>  
> -		if (unlikely(vgic_target_oracle(irq) != vcpu))
> -			goto next;
> -
> -		/*
> -		 * If we get an SGI with multiple sources, try to get
> -		 * them in all at once.
> -		 */
> -		do {
> +		if (likely(vgic_target_oracle(irq) == vcpu))
>  			vgic_populate_lr(vcpu, irq, count++);
> -		} while (irq->source && count < kvm_vgic_global_state.nr_lr);
>  
> -next:
>  		spin_unlock(&irq->irq_lock);
>  
>  		if (count == kvm_vgic_global_state.nr_lr) {
>
Christoffer Dall March 7, 2018, 11:34 p.m. UTC | #2
On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> The vgic code is trying to be clever when injecting GICv2 SGIs,
> and will happily populate LRs with the same interrupt number if
> they come from multiple vcpus (after all, they are distinct
> interrupt sources).
>
> Unfortunately, this is against the letter of the architecture,
> and the GICv2 architecture spec says "Each valid interrupt stored
> in the List registers must have a unique VirtualID for that
> virtual CPU interface.". GICv3 has similar (although slightly
> ambiguous) restrictions.
>
> This results in guests locking up when using GICv2-on-GICv3, for
> example. The obvious fix is to stop trying so hard, and inject
> a single vcpu per SGI per guest entry. After all, pending SGIs
> with multiple source vcpus are pretty rare, and are mostly seen
> in scenario where the physical CPUs are severely overcomitted.
>
> Cc: stable@vger.kernel.org
> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index c7c5ef190afa..1f7ff175f47b 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>                 spin_lock(&irq->irq_lock);
>
> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
> -                       goto next;
> -
> -               /*
> -                * If we get an SGI with multiple sources, try to get
> -                * them in all at once.
> -                */
> -               do {
> +               if (likely(vgic_target_oracle(irq) == vcpu))
>                         vgic_populate_lr(vcpu, irq, count++);

I think we need to change vgic_populate_lr to set the EOI maintenance
interrupt flag so that when the interrupt is deactivated, if there are
additional pending sources, we exit the guest and pick up the
interrupt.

An alternative would be to set the underflow interrupt, but I don't
think that would be correct for multiple priorities, because the SGI
could have a higher priority than other pending interrupts we put in
the LR.

Thanks,
Christoffer
Marc Zyngier March 8, 2018, 10:19 a.m. UTC | #3
On 07/03/18 23:34, Christoffer Dall wrote:
> On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> The vgic code is trying to be clever when injecting GICv2 SGIs,
>> and will happily populate LRs with the same interrupt number if
>> they come from multiple vcpus (after all, they are distinct
>> interrupt sources).
>>
>> Unfortunately, this is against the letter of the architecture,
>> and the GICv2 architecture spec says "Each valid interrupt stored
>> in the List registers must have a unique VirtualID for that
>> virtual CPU interface.". GICv3 has similar (although slightly
>> ambiguous) restrictions.
>>
>> This results in guests locking up when using GICv2-on-GICv3, for
>> example. The obvious fix is to stop trying so hard, and inject
>> a single vcpu per SGI per guest entry. After all, pending SGIs
>> with multiple source vcpus are pretty rare, and are mostly seen
>> in scenario where the physical CPUs are severely overcomitted.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  virt/kvm/arm/vgic/vgic.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index c7c5ef190afa..1f7ff175f47b 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>>                 spin_lock(&irq->irq_lock);
>>
>> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
>> -                       goto next;
>> -
>> -               /*
>> -                * If we get an SGI with multiple sources, try to get
>> -                * them in all at once.
>> -                */
>> -               do {
>> +               if (likely(vgic_target_oracle(irq) == vcpu))
>>                         vgic_populate_lr(vcpu, irq, count++);
> 
> I think we need to change vgic_populate_lr to set the EOI maintenance
> interrupt flag so that when the interrupt is deactivated, if there are
> additional pending sources, we exit the guest and pick up the
> interrupt.

Potentially. We need to be careful about about the semantics of EOI MI
with non-level interrupts (see the other thread about EOI signalling).

> An alternative would be to set the underflow interrupt, but I don't
> think that would be correct for multiple priorities, because the SGI
> could have a higher priority than other pending interrupts we put in
> the LR.

I don't think priorities are the issue (after all, we already sort the
irqs in order of priority). My worry is that underflow is allowed to
fire if there is one interrupt pending, which implies that you could
end-up in a livelock scenario if you only have one SGI pending with
multiple sources.

Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
GICv2), which delivers a a MI if no pending interrupts are present. Once
the SGI has been activated, we're guaranteed to be able to inject a new
pending one.

I like the latter, because it doesn't overload the rest of the code with
new semantics. Thoughts?

	M.
Christoffer Dall March 8, 2018, 4:02 p.m. UTC | #4
On Thu, Mar 08, 2018 at 10:19:49AM +0000, Marc Zyngier wrote:
> On 07/03/18 23:34, Christoffer Dall wrote:
> > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> >> and will happily populate LRs with the same interrupt number if
> >> they come from multiple vcpus (after all, they are distinct
> >> interrupt sources).
> >>
> >> Unfortunately, this is against the letter of the architecture,
> >> and the GICv2 architecture spec says "Each valid interrupt stored
> >> in the List registers must have a unique VirtualID for that
> >> virtual CPU interface.". GICv3 has similar (although slightly
> >> ambiguous) restrictions.
> >>
> >> This results in guests locking up when using GICv2-on-GICv3, for
> >> example. The obvious fix is to stop trying so hard, and inject
> >> a single vcpu per SGI per guest entry. After all, pending SGIs
> >> with multiple source vcpus are pretty rare, and are mostly seen
> >> in scenario where the physical CPUs are severely overcomitted.
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  virt/kvm/arm/vgic/vgic.c | 11 +----------
> >>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> index c7c5ef190afa..1f7ff175f47b 100644
> >> --- a/virt/kvm/arm/vgic/vgic.c
> >> +++ b/virt/kvm/arm/vgic/vgic.c
> >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
> >>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> >>                 spin_lock(&irq->irq_lock);
> >>
> >> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
> >> -                       goto next;
> >> -
> >> -               /*
> >> -                * If we get an SGI with multiple sources, try to get
> >> -                * them in all at once.
> >> -                */
> >> -               do {
> >> +               if (likely(vgic_target_oracle(irq) == vcpu))
> >>                         vgic_populate_lr(vcpu, irq, count++);
> > 
> > I think we need to change vgic_populate_lr to set the EOI maintenance
> > interrupt flag so that when the interrupt is deactivated, if there are
> > additional pending sources, we exit the guest and pick up the
> > interrupt.
> 
> Potentially. We need to be careful about about the semantics of EOI MI
> with non-level interrupts (see the other thread about EOI signalling).

I'll have a look.

> 
> > An alternative would be to set the underflow interrupt, but I don't
> > think that would be correct for multiple priorities, because the SGI
> > could have a higher priority than other pending interrupts we put in
> > the LR.
> 
> I don't think priorities are the issue (after all, we already sort the
> irqs in order of priority). 

Yes, but assume you have three pending interrupts, one SGI from two
sources, and one SPI, and assume that the SGI has priority 1 and SPI
priority 2 (lower means higher priority), then I think with underflow or
the no-pending interrupt flag, we'll deliver the SGI from the first
source, and then the SPI, and then exiting to pick up the last SGI from
the other source.  That's not how I understand the GIC architecture is
supposed to work.  Am I missing something?

> My worry is that underflow is allowed to
> fire if there is one interrupt pending, which implies that you could
> end-up in a livelock scenario if you only have one SGI pending with
> multiple sources.

Yes, doesn't work, so I think it should be a maintenance interrupt on
EOI.

> 
> Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
> GICv2), which delivers a a MI if no pending interrupts are present. Once
> the SGI has been activated, we're guaranteed to be able to inject a new
> pending one.
> 
> I like the latter, because it doesn't overload the rest of the code with
> new semantics. Thoughts?
> 

I'm fine with that if I can be proven wrong about the multiple sources
and priorities.

Thanks,
-Christoffer
Marc Zyngier March 8, 2018, 5:04 p.m. UTC | #5
On Thu, 08 Mar 2018 16:02:42 +0000,
Christoffer Dall wrote:
> 
> On Thu, Mar 08, 2018 at 10:19:49AM +0000, Marc Zyngier wrote:
> > On 07/03/18 23:34, Christoffer Dall wrote:
> > > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> > >> and will happily populate LRs with the same interrupt number if
> > >> they come from multiple vcpus (after all, they are distinct
> > >> interrupt sources).
> > >>
> > >> Unfortunately, this is against the letter of the architecture,
> > >> and the GICv2 architecture spec says "Each valid interrupt stored
> > >> in the List registers must have a unique VirtualID for that
> > >> virtual CPU interface.". GICv3 has similar (although slightly
> > >> ambiguous) restrictions.
> > >>
> > >> This results in guests locking up when using GICv2-on-GICv3, for
> > >> example. The obvious fix is to stop trying so hard, and inject
> > >> a single vcpu per SGI per guest entry. After all, pending SGIs
> > >> with multiple source vcpus are pretty rare, and are mostly seen
> > >> in scenario where the physical CPUs are severely overcomitted.
> > >>
> > >> Cc: stable@vger.kernel.org
> > >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> > >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > >> ---
> > >>  virt/kvm/arm/vgic/vgic.c | 11 +----------
> > >>  1 file changed, 1 insertion(+), 10 deletions(-)
> > >>
> > >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > >> index c7c5ef190afa..1f7ff175f47b 100644
> > >> --- a/virt/kvm/arm/vgic/vgic.c
> > >> +++ b/virt/kvm/arm/vgic/vgic.c
> > >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
> > >>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> > >>                 spin_lock(&irq->irq_lock);
> > >>
> > >> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
> > >> -                       goto next;
> > >> -
> > >> -               /*
> > >> -                * If we get an SGI with multiple sources, try to get
> > >> -                * them in all at once.
> > >> -                */
> > >> -               do {
> > >> +               if (likely(vgic_target_oracle(irq) == vcpu))
> > >>                         vgic_populate_lr(vcpu, irq, count++);
> > > 
> > > I think we need to change vgic_populate_lr to set the EOI maintenance
> > > interrupt flag so that when the interrupt is deactivated, if there are
> > > additional pending sources, we exit the guest and pick up the
> > > interrupt.
> > 
> > Potentially. We need to be careful about about the semantics of EOI MI
> > with non-level interrupts (see the other thread about EOI signalling).
> 
> I'll have a look.
> 
> > 
> > > An alternative would be to set the underflow interrupt, but I don't
> > > think that would be correct for multiple priorities, because the SGI
> > > could have a higher priority than other pending interrupts we put in
> > > the LR.
> > 
> > I don't think priorities are the issue (after all, we already sort the
> > irqs in order of priority). 
> 
> Yes, but assume you have three pending interrupts, one SGI from two
> sources, and one SPI, and assume that the SGI has priority 1 and SPI
> priority 2 (lower means higher priority), then I think with underflow or
> the no-pending interrupt flag, we'll deliver the SGI from the first
> source, and then the SPI, and then exiting to pick up the last SGI from
> the other source.  That's not how I understand the GIC architecture is
> supposed to work.  Am I missing something?

No, you do have a point here. I'm so glad this is a v2 only thing.

> 
> > My worry is that underflow is allowed to
> > fire if there is one interrupt pending, which implies that you could
> > end-up in a livelock scenario if you only have one SGI pending with
> > multiple sources.
> 
> Yes, doesn't work, so I think it should be a maintenance interrupt on
> EOI.
> 
> > 
> > Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on
> > GICv2), which delivers a a MI if no pending interrupts are present. Once
> > the SGI has been activated, we're guaranteed to be able to inject a new
> > pending one.
> > 
> > I like the latter, because it doesn't overload the rest of the code with
> > new semantics. Thoughts?
> > 
> 
> I'm fine with that if I can be proven wrong about the multiple sources
> and priorities.

I'll have a look at respining this with MI on EOI.

Thanks,

	M.
Marc Zyngier March 8, 2018, 6:39 p.m. UTC | #6
On Thu, 08 Mar 2018 17:04:38 +0000,
Marc Zyngier wrote:
> 
> On Thu, 08 Mar 2018 16:02:42 +0000,
> Christoffer Dall wrote:
> > 
> > On Thu, Mar 08, 2018 at 10:19:49AM +0000, Marc Zyngier wrote:
> > > On 07/03/18 23:34, Christoffer Dall wrote:
> > > > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > > >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> > > >> and will happily populate LRs with the same interrupt number if
> > > >> they come from multiple vcpus (after all, they are distinct
> > > >> interrupt sources).
> > > >>
> > > >> Unfortunately, this is against the letter of the architecture,
> > > >> and the GICv2 architecture spec says "Each valid interrupt stored
> > > >> in the List registers must have a unique VirtualID for that
> > > >> virtual CPU interface.". GICv3 has similar (although slightly
> > > >> ambiguous) restrictions.
> > > >>
> > > >> This results in guests locking up when using GICv2-on-GICv3, for
> > > >> example. The obvious fix is to stop trying so hard, and inject
> > > >> a single vcpu per SGI per guest entry. After all, pending SGIs
> > > >> with multiple source vcpus are pretty rare, and are mostly seen
> > > >> in scenario where the physical CPUs are severely overcomitted.
> > > >>
> > > >> Cc: stable@vger.kernel.org
> > > >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> > > >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > >> ---
> > > >>  virt/kvm/arm/vgic/vgic.c | 11 +----------
> > > >>  1 file changed, 1 insertion(+), 10 deletions(-)
> > > >>
> > > >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > > >> index c7c5ef190afa..1f7ff175f47b 100644
> > > >> --- a/virt/kvm/arm/vgic/vgic.c
> > > >> +++ b/virt/kvm/arm/vgic/vgic.c
> > > >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
> > > >>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> > > >>                 spin_lock(&irq->irq_lock);
> > > >>
> > > >> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
> > > >> -                       goto next;
> > > >> -
> > > >> -               /*
> > > >> -                * If we get an SGI with multiple sources, try to get
> > > >> -                * them in all at once.
> > > >> -                */
> > > >> -               do {
> > > >> +               if (likely(vgic_target_oracle(irq) == vcpu))
> > > >>                         vgic_populate_lr(vcpu, irq, count++);
> > > > 
> > > > I think we need to change vgic_populate_lr to set the EOI maintenance
> > > > interrupt flag so that when the interrupt is deactivated, if there are
> > > > additional pending sources, we exit the guest and pick up the
> > > > interrupt.
> > > 
> > > Potentially. We need to be careful about about the semantics of EOI MI
> > > with non-level interrupts (see the other thread about EOI signalling).
> > 
> > I'll have a look.
> > 
> > > 
> > > > An alternative would be to set the underflow interrupt, but I don't
> > > > think that would be correct for multiple priorities, because the SGI
> > > > could have a higher priority than other pending interrupts we put in
> > > > the LR.
> > > 
> > > I don't think priorities are the issue (after all, we already sort the
> > > irqs in order of priority). 
> > 
> > Yes, but assume you have three pending interrupts, one SGI from two
> > sources, and one SPI, and assume that the SGI has priority 1 and SPI
> > priority 2 (lower means higher priority), then I think with underflow or
> > the no-pending interrupt flag, we'll deliver the SGI from the first
> > source, and then the SPI, and then exiting to pick up the last SGI from
> > the other source.  That's not how I understand the GIC architecture is
> > supposed to work.  Am I missing something?
> 
> No, you do have a point here. I'm so glad this is a v2 only thing.
> 
> > 
> > > My worry is that underflow is allowed to
> > > fire if there is one interrupt pending, which implies that you could
> > > end-up in a livelock scenario if you only have one SGI pending with
> > > multiple sources.
> > 
> > Yes, doesn't work, so I think it should be a maintenance interrupt on
> > EOI.

Thinking of it a bit more: MI on EOI doesn't offer much more guarantee
in the way of priority ordering. Taking your example above: Even if
you generate a MI when EOIing the SGI, there is no guarantee that
you'll take the MI before you've acked the SPI.

If you really want to offer that absolute guarantee that all the
multi-source SGIs of higher priority are delivered before anything
else, then you must make sure that only the SGIs are present in the
LRs, excluding any other interrupt on lower priority until you've
queued them all.

At that stage, I wonder if there is a point in doing anything at
all. The GICv2 architecture is too rubbish for words.

	M.
Christoffer Dall March 9, 2018, 9:39 p.m. UTC | #7
On Thu, Mar 08, 2018 at 06:39:20PM +0000, Marc Zyngier wrote:
> On Thu, 08 Mar 2018 17:04:38 +0000,
> Marc Zyngier wrote:
> > 
> > On Thu, 08 Mar 2018 16:02:42 +0000,
> > Christoffer Dall wrote:
> > > 
> > > On Thu, Mar 08, 2018 at 10:19:49AM +0000, Marc Zyngier wrote:
> > > > On 07/03/18 23:34, Christoffer Dall wrote:
> > > > > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > > > >> The vgic code is trying to be clever when injecting GICv2 SGIs,
> > > > >> and will happily populate LRs with the same interrupt number if
> > > > >> they come from multiple vcpus (after all, they are distinct
> > > > >> interrupt sources).
> > > > >>
> > > > >> Unfortunately, this is against the letter of the architecture,
> > > > >> and the GICv2 architecture spec says "Each valid interrupt stored
> > > > >> in the List registers must have a unique VirtualID for that
> > > > >> virtual CPU interface.". GICv3 has similar (although slightly
> > > > >> ambiguous) restrictions.
> > > > >>
> > > > >> This results in guests locking up when using GICv2-on-GICv3, for
> > > > >> example. The obvious fix is to stop trying so hard, and inject
> > > > >> a single vcpu per SGI per guest entry. After all, pending SGIs
> > > > >> with multiple source vcpus are pretty rare, and are mostly seen
> > > > >> in scenario where the physical CPUs are severely overcomitted.
> > > > >>
> > > > >> Cc: stable@vger.kernel.org
> > > > >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
> > > > >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > >> ---
> > > > >>  virt/kvm/arm/vgic/vgic.c | 11 +----------
> > > > >>  1 file changed, 1 insertion(+), 10 deletions(-)
> > > > >>
> > > > >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > > > >> index c7c5ef190afa..1f7ff175f47b 100644
> > > > >> --- a/virt/kvm/arm/vgic/vgic.c
> > > > >> +++ b/virt/kvm/arm/vgic/vgic.c
> > > > >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
> > > > >>         list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> > > > >>                 spin_lock(&irq->irq_lock);
> > > > >>
> > > > >> -               if (unlikely(vgic_target_oracle(irq) != vcpu))
> > > > >> -                       goto next;
> > > > >> -
> > > > >> -               /*
> > > > >> -                * If we get an SGI with multiple sources, try to get
> > > > >> -                * them in all at once.
> > > > >> -                */
> > > > >> -               do {
> > > > >> +               if (likely(vgic_target_oracle(irq) == vcpu))
> > > > >>                         vgic_populate_lr(vcpu, irq, count++);
> > > > > 
> > > > > I think we need to change vgic_populate_lr to set the EOI maintenance
> > > > > interrupt flag so that when the interrupt is deactivated, if there are
> > > > > additional pending sources, we exit the guest and pick up the
> > > > > interrupt.
> > > > 
> > > > Potentially. We need to be careful about about the semantics of EOI MI
> > > > with non-level interrupts (see the other thread about EOI signalling).
> > > 
> > > I'll have a look.
> > > 
> > > > 
> > > > > An alternative would be to set the underflow interrupt, but I don't
> > > > > think that would be correct for multiple priorities, because the SGI
> > > > > could have a higher priority than other pending interrupts we put in
> > > > > the LR.
> > > > 
> > > > I don't think priorities are the issue (after all, we already sort the
> > > > irqs in order of priority). 
> > > 
> > > Yes, but assume you have three pending interrupts, one SGI from two
> > > sources, and one SPI, and assume that the SGI has priority 1 and SPI
> > > priority 2 (lower means higher priority), then I think with underflow or
> > > the no-pending interrupt flag, we'll deliver the SGI from the first
> > > source, and then the SPI, and then exiting to pick up the last SGI from
> > > the other source.  That's not how I understand the GIC architecture is
> > > supposed to work.  Am I missing something?
> > 
> > No, you do have a point here. I'm so glad this is a v2 only thing.
> > 
> > > 
> > > > My worry is that underflow is allowed to
> > > > fire if there is one interrupt pending, which implies that you could
> > > > end-up in a livelock scenario if you only have one SGI pending with
> > > > multiple sources.
> > > 
> > > Yes, doesn't work, so I think it should be a maintenance interrupt on
> > > EOI.
> 
> Thinking of it a bit more: MI on EOI doesn't offer much more guarantee
> in the way of priority ordering. Taking your example above: Even if
> you generate a MI when EOIing the SGI, there is no guarantee that
> you'll take the MI before you've acked the SPI.

There's no guarantee, but at least you're attempting at processing the
SGIs in first.  It's the best we can do, but not completely correct,
kinda thing...

> 
> If you really want to offer that absolute guarantee that all the
> multi-source SGIs of higher priority are delivered before anything
> else, then you must make sure that only the SGIs are present in the
> LRs, excluding any other interrupt on lower priority until you've
> queued them all.

Yes, that sucks!  Might not be too hard to implement, it's basically an
early out of the loop traversing the AP list, but just an annoying
complication.

> 
> At that stage, I wonder if there is a point in doing anything at
> all. The GICv2 architecture is too rubbish for words.
> 

The case we do need to worry about is the guest processing all its
interrupts and not exiting while there is actually still an SGI pending.
At that point, we can either do it with the "no interrupts pending
maintenance interrupt" or with the "EOI maintenance interrupt", and I
think the latter at least gets us slightly closer to the architecture
for a non-virtualized system.

Thanks,
-Christoffer
Marc Zyngier March 10, 2018, 1:57 p.m. UTC | #8
Hi Christoffer,

On Fri, 09 Mar 2018 21:39:31 +0000,
Christoffer Dall wrote:
> 
> On Thu, Mar 08, 2018 at 06:39:20PM +0000, Marc Zyngier wrote:
> > Thinking of it a bit more: MI on EOI doesn't offer much more guarantee
> > in the way of priority ordering. Taking your example above: Even if
> > you generate a MI when EOIing the SGI, there is no guarantee that
> > you'll take the MI before you've acked the SPI.
> 
> There's no guarantee, but at least you're attempting at processing the
> SGIs in first.  It's the best we can do, but not completely correct,
> kinda thing...
> 
> > 
> > If you really want to offer that absolute guarantee that all the
> > multi-source SGIs of higher priority are delivered before anything
> > else, then you must make sure that only the SGIs are present in the
> > LRs, excluding any other interrupt on lower priority until you've
> > queued them all.
> 
> Yes, that sucks!  Might not be too hard to implement, it's basically an
> early out of the loop traversing the AP list, but just an annoying
> complication.

Yeah, it is a bit gross. The way I implemented it is by forcing the AP
list to be sorted if there is any multi-SGI in the pipeline, and early
out as soon as we see an interrupt of a lower priority than the first
multi-SGI. That way, we only have an overhead in the case that
combines multi-SGI and lower priority interrupts.

> > At that stage, I wonder if there is a point in doing anything at
> > all. The GICv2 architecture is too rubbish for words.
> > 
> 
> The case we do need to worry about is the guest processing all its
> interrupts and not exiting while there is actually still an SGI pending.
> At that point, we can either do it with the "no interrupts pending
> maintenance interrupt" or with the "EOI maintenance interrupt", and I
> think the latter at least gets us slightly closer to the architecture
> for a non-virtualized system.

I think that this is where we disagree. I don't see anything in the
architecture that mandates that we should present the SGIs before
anything else. All we need to do is to ensure that interrupts of
higher priority are presented before anything else. It is perfectly
acceptable for an implementation to deliver SGI0, then SPI3, and SGI0
(from another CPU) after that, as long as SPI3 isn't of lesser
priority than SGI0.

Another thing I dislike about using EOI for that is forces us to
propagate the knowledge of the multi-SGI horror further down the
stack, down to both implementations of vgic_populate_lr. NPIE allows
us to keep that knowledge local. But that's an orthogonal issue, and
we can further argue/bikeshed about the respective merits of both
solutions once we have something that fits the sorry state of the
GICv2 architecture ;-).

Thanks,

	M.
Christoffer Dall March 11, 2018, 1:51 a.m. UTC | #9
On Sat, Mar 10, 2018 at 1:57 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Christoffer,
>
> On Fri, 09 Mar 2018 21:39:31 +0000,
> Christoffer Dall wrote:
>>
>> On Thu, Mar 08, 2018 at 06:39:20PM +0000, Marc Zyngier wrote:
>> > Thinking of it a bit more: MI on EOI doesn't offer much more guarantee
>> > in the way of priority ordering. Taking your example above: Even if
>> > you generate a MI when EOIing the SGI, there is no guarantee that
>> > you'll take the MI before you've acked the SPI.
>>
>> There's no guarantee, but at least you're attempting at processing the
>> SGIs in first.  It's the best we can do, but not completely correct,
>> kinda thing...
>>
>> >
>> > If you really want to offer that absolute guarantee that all the
>> > multi-source SGIs of higher priority are delivered before anything
>> > else, then you must make sure that only the SGIs are present in the
>> > LRs, excluding any other interrupt on lower priority until you've
>> > queued them all.
>>
>> Yes, that sucks!  Might not be too hard to implement, it's basically an
>> early out of the loop traversing the AP list, but just an annoying
>> complication.
>
> Yeah, it is a bit gross. The way I implemented it is by forcing the AP
> list to be sorted if there is any multi-SGI in the pipeline, and early
> out as soon as we see an interrupt of a lower priority than the first
> multi-SGI. That way, we only have an overhead in the case that
> combines multi-SGI and lower priority interrupts.
>

yes, that's what I had in mind as well.

>> > At that stage, I wonder if there is a point in doing anything at
>> > all. The GICv2 architecture is too rubbish for words.
>> >
>>
>> The case we do need to worry about is the guest processing all its
>> interrupts and not exiting while there is actually still an SGI pending.
>> At that point, we can either do it with the "no interrupts pending
>> maintenance interrupt" or with the "EOI maintenance interrupt", and I
>> think the latter at least gets us slightly closer to the architecture
>> for a non-virtualized system.
>
> I think that this is where we disagree.

I don't think we disagree, I must have expressed myself poorly...

> I don't see anything in the
> architecture that mandates that we should present the SGIs before
> anything else.

Neither do I.

> All we need to do is to ensure that interrupts of
> higher priority are presented before anything else.

Agreed.

> It is perfectly
> acceptable for an implementation to deliver SGI0, then SPI3, and SGI0
> (from another CPU) after that, as long as SPI3 isn't of lesser
> priority than SGI0.

Yes, but what we cannot do is let the guest deliver SGI0, then SPI3,
and then loop forever without delivering SGI0 from another CPU.
That's why I said "the guest processing all its interrupts and not
exiting while there is actually still an SGI pending" and said that we
could use either the EOI or the NPIE trick.

>
> Another thing I dislike about using EOI for that is forces us to
> propagate the knowledge of the multi-SGI horror further down the
> stack, down to both implementations of vgic_populate_lr. NPIE allows
> us to keep that knowledge local. But that's an orthogonal issue, and
> we can further argue/bikeshed about the respective merits of both
> solutions once we have something that fits the sorry state of the
> GICv2 architecture ;-).
>

Yeah, I don't care deeply.  If NPIE is prettier in the
implementations, let's do that.

Thanks,
-Christoffer
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index c7c5ef190afa..1f7ff175f47b 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -720,18 +720,9 @@  static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
 		spin_lock(&irq->irq_lock);
 
-		if (unlikely(vgic_target_oracle(irq) != vcpu))
-			goto next;
-
-		/*
-		 * If we get an SGI with multiple sources, try to get
-		 * them in all at once.
-		 */
-		do {
+		if (likely(vgic_target_oracle(irq) == vcpu))
 			vgic_populate_lr(vcpu, irq, count++);
-		} while (irq->source && count < kvm_vgic_global_state.nr_lr);
 
-next:
 		spin_unlock(&irq->irq_lock);
 
 		if (count == kvm_vgic_global_state.nr_lr) {