diff mbox series

[2/3] kvm: arm: VGIC: Scan all IRQs when interrupt group gets enabled

Message ID 20191108174952.740-3-andre.przywara@arm.com (mailing list archive)
State New, archived
Headers show
Series kvm: arm: VGIC: Fix interrupt group enablement | expand

Commit Message

Andre Przywara Nov. 8, 2019, 5:49 p.m. UTC
Our current VGIC emulation code treats the "EnableGrpX" bits in GICD_CTLR
as a single global interrupt delivery switch, where in fact the GIC
architecture asks for this being separate for the two interrupt groups.

To implement this properly, we have to slightly adjust our design, to
*not* let IRQs from a disabled interrupt group be added to the ap_list.

As a consequence, enabling one group requires us to re-evaluate every
pending IRQ and potentially add it to its respective ap_list. Similarly
disabling an interrupt group requires pending IRQs to be removed from
the ap_list (as long as they have not been activated yet).

Implement a rather simple, yet not terribly efficient algorithm to
achieve this: For each VCPU we iterate over all IRQs, checking for
pending ones and adding them to the list. We hold the ap_list_lock
for this, to make this atomic from a VCPU's point of view.

When an interrupt group gets disabled, we can't directly remove affected
IRQs from the ap_list, as a running VCPU might have already activated
them, which wouldn't be immediately visible to the host.
Instead simply kick all VCPUs, so that they clean their ap_list's
automatically when running vgic_prune_ap_list().

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 8 deletions(-)

Comments

Marc Zyngier Nov. 10, 2019, 2:29 p.m. UTC | #1
On Fri,  8 Nov 2019 17:49:51 +0000
Andre Przywara <andre.przywara@arm.com> wrote:

> Our current VGIC emulation code treats the "EnableGrpX" bits in GICD_CTLR
> as a single global interrupt delivery switch, where in fact the GIC
> architecture asks for this being separate for the two interrupt groups.
> 
> To implement this properly, we have to slightly adjust our design, to
> *not* let IRQs from a disabled interrupt group be added to the ap_list.
> 
> As a consequence, enabling one group requires us to re-evaluate every
> pending IRQ and potentially add it to its respective ap_list. Similarly
> disabling an interrupt group requires pending IRQs to be removed from
> the ap_list (as long as they have not been activated yet).
> 
> Implement a rather simple, yet not terribly efficient algorithm to
> achieve this: For each VCPU we iterate over all IRQs, checking for
> pending ones and adding them to the list. We hold the ap_list_lock
> for this, to make this atomic from a VCPU's point of view.
> 
> When an interrupt group gets disabled, we can't directly remove affected
> IRQs from the ap_list, as a running VCPU might have already activated
> them, which wouldn't be immediately visible to the host.
> Instead simply kick all VCPUs, so that they clean their ap_list's
> automatically when running vgic_prune_ap_list().
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 80 insertions(+), 8 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 3b88e14d239f..28d9ff282017 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
>  	return 0;
>  }
>  
> +/*
> + * Check whether a given IRQs need to be queued to this ap_list, and do
> + * so if that's the case.
> + * Requires the ap_list_lock to be held (but not the irq lock).
> + *
> + * Returns 1 if that IRQ has been added to the ap_list, and 0 if not.
> + */
> +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> +			     int intid)

true/false seems better than 1/0.

> +{
> +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
> +	int ret = 0;
> +
> +	raw_spin_lock(&irq->irq_lock);
> +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
> +		/*
> +		 * Grab a reference to the irq to reflect the
> +		 * fact that it is now in the ap_list.
> +		 */
> +		vgic_get_irq_kref(irq);
> +		list_add_tail(&irq->ap_list,
> +			      &vcpu->arch.vgic_cpu.ap_list_head);

Two things:
- This should be the job of vgic_queue_irq_unlock. Why are you
  open-coding it?
- What if the interrupt isn't pending? Non-pending, non-active
  interrupts should not be on the AP list!

> +		irq->vcpu = vcpu;
> +
> +		ret = 1;
> +	}
> +	raw_spin_unlock(&irq->irq_lock);
> +	vgic_put_irq(kvm, irq);
> +
> +	return ret;
> +}
> +
>  /*
>   * The group enable status of at least one of the groups has changed.
>   * If enabled is true, at least one of the groups got enabled.
> @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
>   */
>  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
>  {
> +	int cpuid;
> +	struct kvm_vcpu *vcpu;
> +
>  	/*
> -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
> -	 * If a pending IRQ's group is now enabled, add it to its ap_list.
> -	 * If a pending IRQ's group is now disabled, kick the VCPU to
> -	 * let it remove this IRQ from its ap_list. We have to let the
> -	 * VCPU do it itself, because we can't know the exact state of an
> -	 * IRQ pending on a running VCPU.
> +	 * If no group got enabled, we only have to potentially remove
> +	 * interrupts from ap_lists. We can't do this here, because a running
> +	 * VCPU might have ACKed an IRQ already, which wouldn't immediately
> +	 * be reflected in the ap_list.
> +	 * So kick all VCPUs, which will let them re-evaluate their ap_lists
> +	 * by running vgic_prune_ap_list(), removing no longer enabled
> +	 * IRQs.
> +	 */
> +	if (!enabled) {
> +		vgic_kick_vcpus(kvm);
> +
> +		return;
> +	}
> +
> +	/*
> +	 * At least one group went from disabled to enabled. Now we need
> +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
> +	 * If a pending IRQ's group is now enabled, add it to the ap_list.
> +	 *
> +	 * For each VCPU this needs to be atomic, as we need *all* newly
> +	 * enabled IRQs in be in the ap_list to determine the highest
> +	 * priority one.
> +	 * So grab the ap_list_lock, then iterate over all private IRQs and
> +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
> +	 * forward any new IRQs to the guest.
>  	 */
> +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
> +		unsigned long flags;
> +		int i;
>  
> -	 /* For now just kick all VCPUs, as the old code did. */
> -	vgic_kick_vcpus(kvm);
> +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> +
> +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
> +			queue_enabled_irq(kvm, vcpu, i);
> +
> +		for (i = VGIC_NR_PRIVATE_IRQS;
> +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
> +			queue_enabled_irq(kvm, vcpu, i);

On top of my questions above, what happens to LPIs? And if a group has
been disabled, how do you retire these interrupts from the AP list?

> +
> +                raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock,
> +                                           flags);
> +
> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> +			kvm_vcpu_kick(vcpu);
> +		}
> +	}
>  }
>  
>  bool vgic_dist_group_enabled(struct kvm *kvm, int group)

Thanks,

	M.
Andre Przywara Nov. 12, 2019, 9:36 a.m. UTC | #2
On Sun, 10 Nov 2019 14:29:14 +0000
Marc Zyngier <maz@kernel.org> wrote:

Hi Marc,

> On Fri,  8 Nov 2019 17:49:51 +0000
> Andre Przywara <andre.przywara@arm.com> wrote:
> 
> > Our current VGIC emulation code treats the "EnableGrpX" bits in GICD_CTLR
> > as a single global interrupt delivery switch, where in fact the GIC
> > architecture asks for this being separate for the two interrupt groups.
> > 
> > To implement this properly, we have to slightly adjust our design, to
> > *not* let IRQs from a disabled interrupt group be added to the ap_list.
> > 
> > As a consequence, enabling one group requires us to re-evaluate every
> > pending IRQ and potentially add it to its respective ap_list. Similarly
> > disabling an interrupt group requires pending IRQs to be removed from
> > the ap_list (as long as they have not been activated yet).
> > 
> > Implement a rather simple, yet not terribly efficient algorithm to
> > achieve this: For each VCPU we iterate over all IRQs, checking for
> > pending ones and adding them to the list. We hold the ap_list_lock
> > for this, to make this atomic from a VCPU's point of view.
> > 
> > When an interrupt group gets disabled, we can't directly remove affected
> > IRQs from the ap_list, as a running VCPU might have already activated
> > them, which wouldn't be immediately visible to the host.
> > Instead simply kick all VCPUs, so that they clean their ap_list's
> > automatically when running vgic_prune_ap_list().
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > ---
> >  virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 80 insertions(+), 8 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 3b88e14d239f..28d9ff282017 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * Check whether a given IRQs need to be queued to this ap_list, and do
> > + * so if that's the case.
> > + * Requires the ap_list_lock to be held (but not the irq lock).
> > + *
> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 if not.
> > + */
> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> > +			     int intid)  
> 
> true/false seems better than 1/0.

Mmh, indeed. I think I had more in there in an earlier version.

> > +{
> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
> > +	int ret = 0;
> > +
> > +	raw_spin_lock(&irq->irq_lock);
> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
> > +		/*
> > +		 * Grab a reference to the irq to reflect the
> > +		 * fact that it is now in the ap_list.
> > +		 */
> > +		vgic_get_irq_kref(irq);
> > +		list_add_tail(&irq->ap_list,
> > +			      &vcpu->arch.vgic_cpu.ap_list_head);  
> 
> Two things:
> - This should be the job of vgic_queue_irq_unlock. Why are you
>   open-coding it?

I was *really* keen on reusing that, but couldn't  for two reasons:
a) the locking code inside vgic_queue_irq_unlock spoils that: It requires the irq_lock to be held, but not the ap_list_lock. Then it takes both locks, but returns with both of them dropped. We need to hold the ap_list_lock all of the time, to prevent any VCPU returning to the HV to interfere with this routine.
b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to just add all of them first, then kick the VCPU at the end.

So I decided to go with the stripped-down version of it, because I didn't dare to touch the original function. I could refactor this "actually add to the list" part of vgic_queue_irq_unlock() into this new function, then call it from both vgic_queue_irq_unlock() and from the new users.

> - What if the interrupt isn't pending? Non-pending, non-active
>   interrupts should not be on the AP list!

That should be covered by vgic_target_oracle() already, shouldn't it?

> > +		irq->vcpu = vcpu;
> > +
> > +		ret = 1;
> > +	}
> > +	raw_spin_unlock(&irq->irq_lock);
> > +	vgic_put_irq(kvm, irq);
> > +
> > +	return ret;
> > +}
> > +
> >  /*
> >   * The group enable status of at least one of the groups has changed.
> >   * If enabled is true, at least one of the groups got enabled.
> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
> >   */
> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
> >  {
> > +	int cpuid;
> > +	struct kvm_vcpu *vcpu;
> > +
> >  	/*
> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
> > -	 * If a pending IRQ's group is now enabled, add it to its ap_list.
> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
> > -	 * let it remove this IRQ from its ap_list. We have to let the
> > -	 * VCPU do it itself, because we can't know the exact state of an
> > -	 * IRQ pending on a running VCPU.
> > +	 * If no group got enabled, we only have to potentially remove
> > +	 * interrupts from ap_lists. We can't do this here, because a running
> > +	 * VCPU might have ACKed an IRQ already, which wouldn't immediately
> > +	 * be reflected in the ap_list.
> > +	 * So kick all VCPUs, which will let them re-evaluate their ap_lists
> > +	 * by running vgic_prune_ap_list(), removing no longer enabled
> > +	 * IRQs.
> > +	 */
> > +	if (!enabled) {
> > +		vgic_kick_vcpus(kvm);
> > +
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * At least one group went from disabled to enabled. Now we need
> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
> > +	 * If a pending IRQ's group is now enabled, add it to the ap_list.
> > +	 *
> > +	 * For each VCPU this needs to be atomic, as we need *all* newly
> > +	 * enabled IRQs in be in the ap_list to determine the highest
> > +	 * priority one.
> > +	 * So grab the ap_list_lock, then iterate over all private IRQs and
> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
> > +	 * forward any new IRQs to the guest.
> >  	 */
> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
> > +		unsigned long flags;
> > +		int i;
> >  
> > -	 /* For now just kick all VCPUs, as the old code did. */
> > -	vgic_kick_vcpus(kvm);
> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> > +
> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
> > +			queue_enabled_irq(kvm, vcpu, i);
> > +
> > +		for (i = VGIC_NR_PRIVATE_IRQS;
> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
> > +			queue_enabled_irq(kvm, vcpu, i);  
> 
> On top of my questions above, what happens to LPIs?

Oh dear. Looks like wishful thinking on my side ;-) Iterating over all interrupts is probably not a good idea anymore.
Do you think this idea of having a list with group-disabled IRQs is a better approach: In vgic_queue_irq_unlock, if a pending IRQ's group is enabled, it goes into the ap_list, if not, it goes into another list instead. Then we would only need to consult this other list when a group gets enabled. Both lists protected by the same ap_list_lock. Does that make sense?

> And if a group has
> been disabled, how do you retire these interrupts from the AP list?

This is done above: we kick the respective VCPU and rely on vgic_prune_ap_list() to remove them (that uses vgic_target_oracle(), which in turn checks vgic_irq_is_grp_enabled()).

Cheers,
Andre.
 
> > +
> > +                raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock,
> > +                                           flags);
> > +
> > +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > +			kvm_vcpu_kick(vcpu);
> > +		}
> > +	}
> >  }
> >  
> >  bool vgic_dist_group_enabled(struct kvm *kvm, int group)  
> 
> Thanks,
> 
> 	M.
Marc Zyngier Nov. 14, 2019, 11:16 a.m. UTC | #3
On 2019-11-12 09:36, Andre Przywara wrote:
> On Sun, 10 Nov 2019 14:29:14 +0000
> Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Marc,
>
>> On Fri,  8 Nov 2019 17:49:51 +0000
>> Andre Przywara <andre.przywara@arm.com> wrote:
>>
>> > Our current VGIC emulation code treats the "EnableGrpX" bits in 
>> GICD_CTLR
>> > as a single global interrupt delivery switch, where in fact the 
>> GIC
>> > architecture asks for this being separate for the two interrupt 
>> groups.
>> >
>> > To implement this properly, we have to slightly adjust our design, 
>> to
>> > *not* let IRQs from a disabled interrupt group be added to the 
>> ap_list.
>> >
>> > As a consequence, enabling one group requires us to re-evaluate 
>> every
>> > pending IRQ and potentially add it to its respective ap_list. 
>> Similarly
>> > disabling an interrupt group requires pending IRQs to be removed 
>> from
>> > the ap_list (as long as they have not been activated yet).
>> >
>> > Implement a rather simple, yet not terribly efficient algorithm to
>> > achieve this: For each VCPU we iterate over all IRQs, checking for
>> > pending ones and adding them to the list. We hold the ap_list_lock
>> > for this, to make this atomic from a VCPU's point of view.
>> >
>> > When an interrupt group gets disabled, we can't directly remove 
>> affected
>> > IRQs from the ap_list, as a running VCPU might have already 
>> activated
>> > them, which wouldn't be immediately visible to the host.
>> > Instead simply kick all VCPUs, so that they clean their ap_list's
>> > automatically when running vgic_prune_ap_list().
>> >
>> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> > ---
>> >  virt/kvm/arm/vgic/vgic.c | 88 
>> ++++++++++++++++++++++++++++++++++++----
>> >  1 file changed, 80 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> > index 3b88e14d239f..28d9ff282017 100644
>> > --- a/virt/kvm/arm/vgic/vgic.c
>> > +++ b/virt/kvm/arm/vgic/vgic.c
>> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm *kvm, 
>> int group, bool status)
>> >  	return 0;
>> >  }
>> >
>> > +/*
>> > + * Check whether a given IRQs need to be queued to this ap_list, 
>> and do
>> > + * so if that's the case.
>> > + * Requires the ap_list_lock to be held (but not the irq lock).
>> > + *
>> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 if 
>> not.
>> > + */
>> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu 
>> *vcpu,
>> > +			     int intid)
>>
>> true/false seems better than 1/0.
>
> Mmh, indeed. I think I had more in there in an earlier version.
>
>> > +{
>> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
>> > +	int ret = 0;
>> > +
>> > +	raw_spin_lock(&irq->irq_lock);
>> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
>> > +		/*
>> > +		 * Grab a reference to the irq to reflect the
>> > +		 * fact that it is now in the ap_list.
>> > +		 */
>> > +		vgic_get_irq_kref(irq);
>> > +		list_add_tail(&irq->ap_list,
>> > +			      &vcpu->arch.vgic_cpu.ap_list_head);
>>
>> Two things:
>> - This should be the job of vgic_queue_irq_unlock. Why are you
>>   open-coding it?
>
> I was *really* keen on reusing that, but couldn't  for two reasons:
> a) the locking code inside vgic_queue_irq_unlock spoils that: It
> requires the irq_lock to be held, but not the ap_list_lock. Then it
> takes both locks, but returns with both of them dropped. We need to
> hold the ap_list_lock all of the time, to prevent any VCPU returning
> to the HV to interfere with this routine.
> b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to
> just add all of them first, then kick the VCPU at the end.

Indeed, and that is why you need to change the way you queue these
pending, enabled, group-disabled interrupts (see the LPI issue below).

>
> So I decided to go with the stripped-down version of it, because I
> didn't dare to touch the original function. I could refactor this
> "actually add to the list" part of vgic_queue_irq_unlock() into this
> new function, then call it from both vgic_queue_irq_unlock() and from
> the new users.
>
>> - What if the interrupt isn't pending? Non-pending, non-active
>>   interrupts should not be on the AP list!
>
> That should be covered by vgic_target_oracle() already, shouldn't it?

Ah, yes, you're right.

>
>> > +		irq->vcpu = vcpu;
>> > +
>> > +		ret = 1;
>> > +	}
>> > +	raw_spin_unlock(&irq->irq_lock);
>> > +	vgic_put_irq(kvm, irq);
>> > +
>> > +	return ret;
>> > +}
>> > +
>> >  /*
>> >   * The group enable status of at least one of the groups has 
>> changed.
>> >   * If enabled is true, at least one of the groups got enabled.
>> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm *kvm, 
>> int group, bool status)
>> >   */
>> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
>> >  {
>> > +	int cpuid;
>> > +	struct kvm_vcpu *vcpu;
>> > +
>> >  	/*
>> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
>> > -	 * If a pending IRQ's group is now enabled, add it to its 
>> ap_list.
>> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
>> > -	 * let it remove this IRQ from its ap_list. We have to let the
>> > -	 * VCPU do it itself, because we can't know the exact state of 
>> an
>> > -	 * IRQ pending on a running VCPU.
>> > +	 * If no group got enabled, we only have to potentially remove
>> > +	 * interrupts from ap_lists. We can't do this here, because a 
>> running
>> > +	 * VCPU might have ACKed an IRQ already, which wouldn't 
>> immediately
>> > +	 * be reflected in the ap_list.
>> > +	 * So kick all VCPUs, which will let them re-evaluate their 
>> ap_lists
>> > +	 * by running vgic_prune_ap_list(), removing no longer enabled
>> > +	 * IRQs.
>> > +	 */
>> > +	if (!enabled) {
>> > +		vgic_kick_vcpus(kvm);
>> > +
>> > +		return;
>> > +	}
>> > +
>> > +	/*
>> > +	 * At least one group went from disabled to enabled. Now we need
>> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
>> > +	 * If a pending IRQ's group is now enabled, add it to the 
>> ap_list.
>> > +	 *
>> > +	 * For each VCPU this needs to be atomic, as we need *all* newly
>> > +	 * enabled IRQs in be in the ap_list to determine the highest
>> > +	 * priority one.
>> > +	 * So grab the ap_list_lock, then iterate over all private IRQs 
>> and
>> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
>> > +	 * forward any new IRQs to the guest.
>> >  	 */
>> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
>> > +		unsigned long flags;
>> > +		int i;
>> >
>> > -	 /* For now just kick all VCPUs, as the old code did. */
>> > -	vgic_kick_vcpus(kvm);
>> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, 
>> flags);
>> > +
>> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
>> > +			queue_enabled_irq(kvm, vcpu, i);
>> > +
>> > +		for (i = VGIC_NR_PRIVATE_IRQS;
>> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
>> > +			queue_enabled_irq(kvm, vcpu, i);
>>
>> On top of my questions above, what happens to LPIs?
>
> Oh dear. Looks like wishful thinking on my side ;-) Iterating over
> all interrupts is probably not a good idea anymore.
> Do you think this idea of having a list with group-disabled IRQs is a
> better approach: In vgic_queue_irq_unlock, if a pending IRQ's group 
> is
> enabled, it goes into the ap_list, if not, it goes into another list
> instead. Then we would only need to consult this other list when a
> group gets enabled. Both lists protected by the same ap_list_lock.
> Does that make sense?

I think that could work. One queue for each group, holding pending,
enabled, group-disabled interrupts. Pending, disabled interrupts are
not queued anywhere, just like today.

The only snag is per-cpu interrupts. On which queue do they live?
Do you have per-CPU queues? or a global one?

>> And if a group has
>> been disabled, how do you retire these interrupts from the AP list?
>
> This is done above: we kick the respective VCPU and rely on
> vgic_prune_ap_list() to remove them (that uses vgic_target_oracle(),
> which in turn checks vgic_irq_is_grp_enabled()).

But what if the CPU isn't running? Kicking it isn't going to do much,
is it?

Thanks,

         M.
Andre Przywara Nov. 18, 2019, 2:12 p.m. UTC | #4
On Thu, 14 Nov 2019 11:16:55 +0000
Marc Zyngier <maz@kernel.org> wrote:

Hi Marc,

> On 2019-11-12 09:36, Andre Przywara wrote:
> > On Sun, 10 Nov 2019 14:29:14 +0000
> > Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Marc,
> >  
> >> On Fri,  8 Nov 2019 17:49:51 +0000
> >> Andre Przywara <andre.przywara@arm.com> wrote:
> >>  
> >> > Our current VGIC emulation code treats the "EnableGrpX" bits in   
> >> GICD_CTLR  
> >> > as a single global interrupt delivery switch, where in fact the   
> >> GIC  
> >> > architecture asks for this being separate for the two interrupt   
> >> groups.  
> >> >
> >> > To implement this properly, we have to slightly adjust our design,   
> >> to  
> >> > *not* let IRQs from a disabled interrupt group be added to the   
> >> ap_list.  
> >> >
> >> > As a consequence, enabling one group requires us to re-evaluate   
> >> every  
> >> > pending IRQ and potentially add it to its respective ap_list.   
> >> Similarly  
> >> > disabling an interrupt group requires pending IRQs to be removed   
> >> from  
> >> > the ap_list (as long as they have not been activated yet).
> >> >
> >> > Implement a rather simple, yet not terribly efficient algorithm to
> >> > achieve this: For each VCPU we iterate over all IRQs, checking for
> >> > pending ones and adding them to the list. We hold the ap_list_lock
> >> > for this, to make this atomic from a VCPU's point of view.
> >> >
> >> > When an interrupt group gets disabled, we can't directly remove   
> >> affected  
> >> > IRQs from the ap_list, as a running VCPU might have already   
> >> activated  
> >> > them, which wouldn't be immediately visible to the host.
> >> > Instead simply kick all VCPUs, so that they clean their ap_list's
> >> > automatically when running vgic_prune_ap_list().
> >> >
> >> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> >> > ---
> >> >  virt/kvm/arm/vgic/vgic.c | 88   
> >> ++++++++++++++++++++++++++++++++++++----  
> >> >  1 file changed, 80 insertions(+), 8 deletions(-)
> >> >
> >> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> > index 3b88e14d239f..28d9ff282017 100644
> >> > --- a/virt/kvm/arm/vgic/vgic.c
> >> > +++ b/virt/kvm/arm/vgic/vgic.c
> >> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm *kvm,   
> >> int group, bool status)  
> >> >  	return 0;
> >> >  }
> >> >
> >> > +/*
> >> > + * Check whether a given IRQs need to be queued to this ap_list,   
> >> and do  
> >> > + * so if that's the case.
> >> > + * Requires the ap_list_lock to be held (but not the irq lock).
> >> > + *
> >> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 if   
> >> not.  
> >> > + */
> >> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu   
> >> *vcpu,  
> >> > +			     int intid)  
> >>
> >> true/false seems better than 1/0.  
> >
> > Mmh, indeed. I think I had more in there in an earlier version.
> >  
> >> > +{
> >> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
> >> > +	int ret = 0;
> >> > +
> >> > +	raw_spin_lock(&irq->irq_lock);
> >> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
> >> > +		/*
> >> > +		 * Grab a reference to the irq to reflect the
> >> > +		 * fact that it is now in the ap_list.
> >> > +		 */
> >> > +		vgic_get_irq_kref(irq);
> >> > +		list_add_tail(&irq->ap_list,
> >> > +			      &vcpu->arch.vgic_cpu.ap_list_head);  
> >>
> >> Two things:
> >> - This should be the job of vgic_queue_irq_unlock. Why are you
> >>   open-coding it?  
> >
> > I was *really* keen on reusing that, but couldn't  for two reasons:
> > a) the locking code inside vgic_queue_irq_unlock spoils that: It
> > requires the irq_lock to be held, but not the ap_list_lock. Then it
> > takes both locks, but returns with both of them dropped. We need to
> > hold the ap_list_lock all of the time, to prevent any VCPU returning
> > to the HV to interfere with this routine.
> > b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to
> > just add all of them first, then kick the VCPU at the end.  
> 
> Indeed, and that is why you need to change the way you queue these
> pending, enabled, group-disabled interrupts (see the LPI issue below).
> 
> >
> > So I decided to go with the stripped-down version of it, because I
> > didn't dare to touch the original function. I could refactor this
> > "actually add to the list" part of vgic_queue_irq_unlock() into this
> > new function, then call it from both vgic_queue_irq_unlock() and from
> > the new users.
> >  
> >> - What if the interrupt isn't pending? Non-pending, non-active
> >>   interrupts should not be on the AP list!  
> >
> > That should be covered by vgic_target_oracle() already, shouldn't it?  
> 
> Ah, yes, you're right.
> 
> >  
> >> > +		irq->vcpu = vcpu;
> >> > +
> >> > +		ret = 1;
> >> > +	}
> >> > +	raw_spin_unlock(&irq->irq_lock);
> >> > +	vgic_put_irq(kvm, irq);
> >> > +
> >> > +	return ret;
> >> > +}
> >> > +
> >> >  /*
> >> >   * The group enable status of at least one of the groups has   
> >> changed.  
> >> >   * If enabled is true, at least one of the groups got enabled.
> >> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm *kvm,   
> >> int group, bool status)  
> >> >   */
> >> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
> >> >  {
> >> > +	int cpuid;
> >> > +	struct kvm_vcpu *vcpu;
> >> > +
> >> >  	/*
> >> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
> >> > -	 * If a pending IRQ's group is now enabled, add it to its   
> >> ap_list.  
> >> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
> >> > -	 * let it remove this IRQ from its ap_list. We have to let the
> >> > -	 * VCPU do it itself, because we can't know the exact state of   
> >> an  
> >> > -	 * IRQ pending on a running VCPU.
> >> > +	 * If no group got enabled, we only have to potentially remove
> >> > +	 * interrupts from ap_lists. We can't do this here, because a   
> >> running  
> >> > +	 * VCPU might have ACKed an IRQ already, which wouldn't   
> >> immediately  
> >> > +	 * be reflected in the ap_list.
> >> > +	 * So kick all VCPUs, which will let them re-evaluate their   
> >> ap_lists  
> >> > +	 * by running vgic_prune_ap_list(), removing no longer enabled
> >> > +	 * IRQs.
> >> > +	 */
> >> > +	if (!enabled) {
> >> > +		vgic_kick_vcpus(kvm);
> >> > +
> >> > +		return;
> >> > +	}
> >> > +
> >> > +	/*
> >> > +	 * At least one group went from disabled to enabled. Now we need
> >> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
> >> > +	 * If a pending IRQ's group is now enabled, add it to the   
> >> ap_list.  
> >> > +	 *
> >> > +	 * For each VCPU this needs to be atomic, as we need *all* newly
> >> > +	 * enabled IRQs in be in the ap_list to determine the highest
> >> > +	 * priority one.
> >> > +	 * So grab the ap_list_lock, then iterate over all private IRQs   
> >> and  
> >> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
> >> > +	 * forward any new IRQs to the guest.
> >> >  	 */
> >> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
> >> > +		unsigned long flags;
> >> > +		int i;
> >> >
> >> > -	 /* For now just kick all VCPUs, as the old code did. */
> >> > -	vgic_kick_vcpus(kvm);
> >> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock,   
> >> flags);  
> >> > +
> >> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
> >> > +			queue_enabled_irq(kvm, vcpu, i);
> >> > +
> >> > +		for (i = VGIC_NR_PRIVATE_IRQS;
> >> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
> >> > +			queue_enabled_irq(kvm, vcpu, i);  
> >>
> >> On top of my questions above, what happens to LPIs?  
> >
> > Oh dear. Looks like wishful thinking on my side ;-) Iterating over
> > all interrupts is probably not a good idea anymore.
> > Do you think this idea of having a list with group-disabled IRQs is a
> > better approach: In vgic_queue_irq_unlock, if a pending IRQ's group 
> > is
> > enabled, it goes into the ap_list, if not, it goes into another list
> > instead. Then we would only need to consult this other list when a
> > group gets enabled. Both lists protected by the same ap_list_lock.
> > Does that make sense?  
> 
> I think that could work. One queue for each group, holding pending,
> enabled, group-disabled interrupts. Pending, disabled interrupts are
> not queued anywhere, just like today.
> 
> The only snag is per-cpu interrupts. On which queue do they live?
> Do you have per-CPU queues? or a global one?

Yes, the idea was to have a per-VCPU "grp_dis_list" in addition to the ap_list, reusing the ap_list list_head in struct vgic_irq. vgic_queue_irq_unlock() would put them into *one* of those two lists, depending on their group-enabled status. When a group gets enabled, we just have to transfer the IRQs from grp_dis_list to ap_list.

But fleshing this out I was wondering if it couldn't be much simpler:
We ignore the group-enabled status most of the time, except in vgic_flush_lr_state(). So group-disabled IRQs *would go* to the ap_list (when they are otherwise pending|active and enabled), but would be skipped when eventually populating the LRs. vgic_prune_ap_list would also not touch them, so they would stay in the ap_list (unless removed for other reasons).

That might raise some eyebrows (because we keep IRQs in the ap_list which are not ready), but would require only minimal changes and avoid all kind of nasty/racy code to be added. The only downside I see is that the ap_list could potentially be much longer, but we could change the sorting algorithm if needed to keep group-disabled IRQs at the end, at which point it wouldn't really matter.

Do you see any problem with that approach? Alex seemed to remember that you had an objection against a very similar (if not identical) idea before.

> >> And if a group has
> >> been disabled, how do you retire these interrupts from the AP list?  
> >
> > This is done above: we kick the respective VCPU and rely on
> > vgic_prune_ap_list() to remove them (that uses vgic_target_oracle(),
> > which in turn checks vgic_irq_is_grp_enabled()).  
> 
> But what if the CPU isn't running? Kicking it isn't going to do much,
> is it?

Not directly, but in either approach that would be handled similar to disabled interrupts: once the VCPU runs, they would *not* end up in LRs (because we check the oracle before), and would be cleaned up in prune() once the guest exits (at least for the original approach).

Cheers,
Andre.
Marc Zyngier Nov. 19, 2019, 9:40 a.m. UTC | #5
Hi Andre,

On 2019-11-18 14:12, Andre Przywara wrote:
> On Thu, 14 Nov 2019 11:16:55 +0000
> Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Marc,
>
>> On 2019-11-12 09:36, Andre Przywara wrote:
>> > On Sun, 10 Nov 2019 14:29:14 +0000
>> > Marc Zyngier <maz@kernel.org> wrote:
>> >
>> > Hi Marc,
>> >
>> >> On Fri,  8 Nov 2019 17:49:51 +0000
>> >> Andre Przywara <andre.przywara@arm.com> wrote:
>> >>
>> >> > Our current VGIC emulation code treats the "EnableGrpX" bits in
>> >> GICD_CTLR
>> >> > as a single global interrupt delivery switch, where in fact the
>> >> GIC
>> >> > architecture asks for this being separate for the two interrupt
>> >> groups.
>> >> >
>> >> > To implement this properly, we have to slightly adjust our 
>> design,
>> >> to
>> >> > *not* let IRQs from a disabled interrupt group be added to the
>> >> ap_list.
>> >> >
>> >> > As a consequence, enabling one group requires us to re-evaluate
>> >> every
>> >> > pending IRQ and potentially add it to its respective ap_list.
>> >> Similarly
>> >> > disabling an interrupt group requires pending IRQs to be 
>> removed
>> >> from
>> >> > the ap_list (as long as they have not been activated yet).
>> >> >
>> >> > Implement a rather simple, yet not terribly efficient algorithm 
>> to
>> >> > achieve this: For each VCPU we iterate over all IRQs, checking 
>> for
>> >> > pending ones and adding them to the list. We hold the 
>> ap_list_lock
>> >> > for this, to make this atomic from a VCPU's point of view.
>> >> >
>> >> > When an interrupt group gets disabled, we can't directly remove
>> >> affected
>> >> > IRQs from the ap_list, as a running VCPU might have already
>> >> activated
>> >> > them, which wouldn't be immediately visible to the host.
>> >> > Instead simply kick all VCPUs, so that they clean their 
>> ap_list's
>> >> > automatically when running vgic_prune_ap_list().
>> >> >
>> >> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> >> > ---
>> >> >  virt/kvm/arm/vgic/vgic.c | 88
>> >> ++++++++++++++++++++++++++++++++++++----
>> >> >  1 file changed, 80 insertions(+), 8 deletions(-)
>> >> >
>> >> > diff --git a/virt/kvm/arm/vgic/vgic.c 
>> b/virt/kvm/arm/vgic/vgic.c
>> >> > index 3b88e14d239f..28d9ff282017 100644
>> >> > --- a/virt/kvm/arm/vgic/vgic.c
>> >> > +++ b/virt/kvm/arm/vgic/vgic.c
>> >> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm 
>> *kvm,
>> >> int group, bool status)
>> >> >  	return 0;
>> >> >  }
>> >> >
>> >> > +/*
>> >> > + * Check whether a given IRQs need to be queued to this 
>> ap_list,
>> >> and do
>> >> > + * so if that's the case.
>> >> > + * Requires the ap_list_lock to be held (but not the irq 
>> lock).
>> >> > + *
>> >> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 
>> if
>> >> not.
>> >> > + */
>> >> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu
>> >> *vcpu,
>> >> > +			     int intid)
>> >>
>> >> true/false seems better than 1/0.
>> >
>> > Mmh, indeed. I think I had more in there in an earlier version.
>> >
>> >> > +{
>> >> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
>> >> > +	int ret = 0;
>> >> > +
>> >> > +	raw_spin_lock(&irq->irq_lock);
>> >> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
>> >> > +		/*
>> >> > +		 * Grab a reference to the irq to reflect the
>> >> > +		 * fact that it is now in the ap_list.
>> >> > +		 */
>> >> > +		vgic_get_irq_kref(irq);
>> >> > +		list_add_tail(&irq->ap_list,
>> >> > +			      &vcpu->arch.vgic_cpu.ap_list_head);
>> >>
>> >> Two things:
>> >> - This should be the job of vgic_queue_irq_unlock. Why are you
>> >>   open-coding it?
>> >
>> > I was *really* keen on reusing that, but couldn't  for two 
>> reasons:
>> > a) the locking code inside vgic_queue_irq_unlock spoils that: It
>> > requires the irq_lock to be held, but not the ap_list_lock. Then 
>> it
>> > takes both locks, but returns with both of them dropped. We need 
>> to
>> > hold the ap_list_lock all of the time, to prevent any VCPU 
>> returning
>> > to the HV to interfere with this routine.
>> > b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to
>> > just add all of them first, then kick the VCPU at the end.
>>
>> Indeed, and that is why you need to change the way you queue these
>> pending, enabled, group-disabled interrupts (see the LPI issue 
>> below).
>>
>> >
>> > So I decided to go with the stripped-down version of it, because I
>> > didn't dare to touch the original function. I could refactor this
>> > "actually add to the list" part of vgic_queue_irq_unlock() into 
>> this
>> > new function, then call it from both vgic_queue_irq_unlock() and 
>> from
>> > the new users.
>> >
>> >> - What if the interrupt isn't pending? Non-pending, non-active
>> >>   interrupts should not be on the AP list!
>> >
>> > That should be covered by vgic_target_oracle() already, shouldn't 
>> it?
>>
>> Ah, yes, you're right.
>>
>> >
>> >> > +		irq->vcpu = vcpu;
>> >> > +
>> >> > +		ret = 1;
>> >> > +	}
>> >> > +	raw_spin_unlock(&irq->irq_lock);
>> >> > +	vgic_put_irq(kvm, irq);
>> >> > +
>> >> > +	return ret;
>> >> > +}
>> >> > +
>> >> >  /*
>> >> >   * The group enable status of at least one of the groups has
>> >> changed.
>> >> >   * If enabled is true, at least one of the groups got enabled.
>> >> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm 
>> *kvm,
>> >> int group, bool status)
>> >> >   */
>> >> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
>> >> >  {
>> >> > +	int cpuid;
>> >> > +	struct kvm_vcpu *vcpu;
>> >> > +
>> >> >  	/*
>> >> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
>> >> > -	 * If a pending IRQ's group is now enabled, add it to its
>> >> ap_list.
>> >> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
>> >> > -	 * let it remove this IRQ from its ap_list. We have to let 
>> the
>> >> > -	 * VCPU do it itself, because we can't know the exact state 
>> of
>> >> an
>> >> > -	 * IRQ pending on a running VCPU.
>> >> > +	 * If no group got enabled, we only have to potentially 
>> remove
>> >> > +	 * interrupts from ap_lists. We can't do this here, because a
>> >> running
>> >> > +	 * VCPU might have ACKed an IRQ already, which wouldn't
>> >> immediately
>> >> > +	 * be reflected in the ap_list.
>> >> > +	 * So kick all VCPUs, which will let them re-evaluate their
>> >> ap_lists
>> >> > +	 * by running vgic_prune_ap_list(), removing no longer 
>> enabled
>> >> > +	 * IRQs.
>> >> > +	 */
>> >> > +	if (!enabled) {
>> >> > +		vgic_kick_vcpus(kvm);
>> >> > +
>> >> > +		return;
>> >> > +	}
>> >> > +
>> >> > +	/*
>> >> > +	 * At least one group went from disabled to enabled. Now we 
>> need
>> >> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
>> >> > +	 * If a pending IRQ's group is now enabled, add it to the
>> >> ap_list.
>> >> > +	 *
>> >> > +	 * For each VCPU this needs to be atomic, as we need *all* 
>> newly
>> >> > +	 * enabled IRQs in be in the ap_list to determine the highest
>> >> > +	 * priority one.
>> >> > +	 * So grab the ap_list_lock, then iterate over all private 
>> IRQs
>> >> and
>> >> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
>> >> > +	 * forward any new IRQs to the guest.
>> >> >  	 */
>> >> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
>> >> > +		unsigned long flags;
>> >> > +		int i;
>> >> >
>> >> > -	 /* For now just kick all VCPUs, as the old code did. */
>> >> > -	vgic_kick_vcpus(kvm);
>> >> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock,
>> >> flags);
>> >> > +
>> >> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
>> >> > +			queue_enabled_irq(kvm, vcpu, i);
>> >> > +
>> >> > +		for (i = VGIC_NR_PRIVATE_IRQS;
>> >> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
>> >> > +			queue_enabled_irq(kvm, vcpu, i);
>> >>
>> >> On top of my questions above, what happens to LPIs?
>> >
>> > Oh dear. Looks like wishful thinking on my side ;-) Iterating over
>> > all interrupts is probably not a good idea anymore.
>> > Do you think this idea of having a list with group-disabled IRQs 
>> is a
>> > better approach: In vgic_queue_irq_unlock, if a pending IRQ's 
>> group
>> > is
>> > enabled, it goes into the ap_list, if not, it goes into another 
>> list
>> > instead. Then we would only need to consult this other list when a
>> > group gets enabled. Both lists protected by the same ap_list_lock.
>> > Does that make sense?
>>
>> I think that could work. One queue for each group, holding pending,
>> enabled, group-disabled interrupts. Pending, disabled interrupts are
>> not queued anywhere, just like today.
>>
>> The only snag is per-cpu interrupts. On which queue do they live?
>> Do you have per-CPU queues? or a global one?
>
> Yes, the idea was to have a per-VCPU "grp_dis_list" in addition to
> the ap_list, reusing the ap_list list_head in struct vgic_irq.
> vgic_queue_irq_unlock() would put them into *one* of those two lists,
> depending on their group-enabled status. When a group gets enabled, 
> we
> just have to transfer the IRQs from grp_dis_list to ap_list.
>
> But fleshing this out I was wondering if it couldn't be much simpler:
> We ignore the group-enabled status most of the time, except in
> vgic_flush_lr_state(). So group-disabled IRQs *would go* to the
> ap_list (when they are otherwise pending|active and enabled), but
> would be skipped when eventually populating the LRs.
> vgic_prune_ap_list would also not touch them, so they would stay in
> the ap_list (unless removed for other reasons).
>
> That might raise some eyebrows (because we keep IRQs in the ap_list
> which are not ready), but would require only minimal changes and 
> avoid
> all kind of nasty/racy code to be added. The only downside I see is
> that the ap_list could potentially be much longer, but we could 
> change
> the sorting algorithm if needed to keep group-disabled IRQs at the
> end, at which point it wouldn't really matter.
>
> Do you see any problem with that approach? Alex seemed to remember
> that you had an objection against a very similar (if not identical)
> idea before.

My main worry with this is that it causes overhead on the fast path.
Disabled interrupts (for whichever reason they are disabled) shouldn't
have to be evaluated on the fast path.

Take for example kvm_vgic_vcpu_pending_irq(), which we evaluate pretty
often (each time a vcpu wakes up). Do we want to scan a bunch of
group-disabled interrupts there? No.

At the end of the day, what we're looking at is a list of disabled,
pending interrupts. They can be disabled for multiple reasons
(group is disabled, or interrupt itself is disabled). But they should
*not* end-up on the AP list, because that list has a precise semantic.

Your suggestion to add the group-disabled interrupts to the AP list
may be a cool hack, but it is mostly a hack that opens the whole thing
to a bunch of corner cases. Let's not do that.

>> >> And if a group has
>> >> been disabled, how do you retire these interrupts from the AP 
>> list?
>> >
>> > This is done above: we kick the respective VCPU and rely on
>> > vgic_prune_ap_list() to remove them (that uses 
>> vgic_target_oracle(),
>> > which in turn checks vgic_irq_is_grp_enabled()).
>>
>> But what if the CPU isn't running? Kicking it isn't going to do 
>> much,
>> is it?
>
> Not directly, but in either approach that would be handled similar to
> disabled interrupts: once the VCPU runs, they would *not* end up in
> LRs (because we check the oracle before), and would be cleaned up in
> prune() once the guest exits (at least for the original approach).

I lost track of the original approach already :-/

Try and build the above suggestion. It should follow the same flow as
the enabled, group-enabled interrupts, just with a different list.

Thanks,

         M.
Andre Przywara Nov. 19, 2019, 2:32 p.m. UTC | #6
On Tue, 19 Nov 2019 09:40:40 +0000
Marc Zyngier <maz@kernel.org> wrote:

Hi Marc,

[ ... ]

> >>
> >> I think that could work. One queue for each group, holding pending,
> >> enabled, group-disabled interrupts. Pending, disabled interrupts are
> >> not queued anywhere, just like today.
> >>
> >> The only snag is per-cpu interrupts. On which queue do they live?
> >> Do you have per-CPU queues? or a global one?  
> >
> > Yes, the idea was to have a per-VCPU "grp_dis_list" in addition to
> > the ap_list, reusing the ap_list list_head in struct vgic_irq.
> > vgic_queue_irq_unlock() would put them into *one* of those two lists,
> > depending on their group-enabled status. When a group gets enabled, 
> > we
> > just have to transfer the IRQs from grp_dis_list to ap_list.
> >
> > But fleshing this out I was wondering if it couldn't be much simpler:
> > We ignore the group-enabled status most of the time, except in
> > vgic_flush_lr_state(). So group-disabled IRQs *would go* to the
> > ap_list (when they are otherwise pending|active and enabled), but
> > would be skipped when eventually populating the LRs.
> > vgic_prune_ap_list would also not touch them, so they would stay in
> > the ap_list (unless removed for other reasons).
> >
> > That might raise some eyebrows (because we keep IRQs in the ap_list
> > which are not ready), but would require only minimal changes and 
> > avoid
> > all kind of nasty/racy code to be added. The only downside I see is
> > that the ap_list could potentially be much longer, but we could 
> > change
> > the sorting algorithm if needed to keep group-disabled IRQs at the
> > end, at which point it wouldn't really matter.
> >
> > Do you see any problem with that approach? Alex seemed to remember
> > that you had an objection against a very similar (if not identical)
> > idea before.  
> 
> My main worry with this is that it causes overhead on the fast path.
> Disabled interrupts (for whichever reason they are disabled) shouldn't
> have to be evaluated on the fast path.
> 
> Take for example kvm_vgic_vcpu_pending_irq(), which we evaluate pretty
> often (each time a vcpu wakes up). Do we want to scan a bunch of
> group-disabled interrupts there? No.
> 
> At the end of the day, what we're looking at is a list of disabled,
> pending interrupts. They can be disabled for multiple reasons
> (group is disabled, or interrupt itself is disabled). But they should
> *not* end-up on the AP list, because that list has a precise semantic.
> 
> Your suggestion to add the group-disabled interrupts to the AP list
> may be a cool hack, but it is mostly a hack that opens the whole thing
> to a bunch of corner cases. Let's not do that.

I understand what you are saying, and I had similar gripes. It was just too tempting to not give it a try ;-)
 
> >> >> And if a group has
> >> >> been disabled, how do you retire these interrupts from the AP   
> >> list?  
> >> >
> >> > This is done above: we kick the respective VCPU and rely on
> >> > vgic_prune_ap_list() to remove them (that uses   
> >> vgic_target_oracle(),  
> >> > which in turn checks vgic_irq_is_grp_enabled()).  
> >>
> >> But what if the CPU isn't running? Kicking it isn't going to do 
> >> much,
> >> is it?  
> >
> > Not directly, but in either approach that would be handled similar to
> > disabled interrupts: once the VCPU runs, they would *not* end up in
> > LRs (because we check the oracle before), and would be cleaned up in
> > prune() once the guest exits (at least for the original approach).  
> 
> I lost track of the original approach already :-/
> 
> Try and build the above suggestion. It should follow the same flow as
> the enabled, group-enabled interrupts, just with a different list.

OK, will do.

Thanks!

Andre.
diff mbox series

Patch

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 3b88e14d239f..28d9ff282017 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -339,6 +339,38 @@  int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
 	return 0;
 }
 
+/*
+ * Check whether a given IRQs need to be queued to this ap_list, and do
+ * so if that's the case.
+ * Requires the ap_list_lock to be held (but not the irq lock).
+ *
+ * Returns 1 if that IRQ has been added to the ap_list, and 0 if not.
+ */
+static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
+			     int intid)
+{
+	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
+	int ret = 0;
+
+	raw_spin_lock(&irq->irq_lock);
+	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
+		/*
+		 * Grab a reference to the irq to reflect the
+		 * fact that it is now in the ap_list.
+		 */
+		vgic_get_irq_kref(irq);
+		list_add_tail(&irq->ap_list,
+			      &vcpu->arch.vgic_cpu.ap_list_head);
+		irq->vcpu = vcpu;
+
+		ret = 1;
+	}
+	raw_spin_unlock(&irq->irq_lock);
+	vgic_put_irq(kvm, irq);
+
+	return ret;
+}
+
 /*
  * The group enable status of at least one of the groups has changed.
  * If enabled is true, at least one of the groups got enabled.
@@ -346,17 +378,57 @@  int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
  */
 void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
 {
+	int cpuid;
+	struct kvm_vcpu *vcpu;
+
 	/*
-	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
-	 * If a pending IRQ's group is now enabled, add it to its ap_list.
-	 * If a pending IRQ's group is now disabled, kick the VCPU to
-	 * let it remove this IRQ from its ap_list. We have to let the
-	 * VCPU do it itself, because we can't know the exact state of an
-	 * IRQ pending on a running VCPU.
+	 * If no group got enabled, we only have to potentially remove
+	 * interrupts from ap_lists. We can't do this here, because a running
+	 * VCPU might have ACKed an IRQ already, which wouldn't immediately
+	 * be reflected in the ap_list.
+	 * So kick all VCPUs, which will let them re-evaluate their ap_lists
+	 * by running vgic_prune_ap_list(), removing no longer enabled
+	 * IRQs.
+	 */
+	if (!enabled) {
+		vgic_kick_vcpus(kvm);
+
+		return;
+	}
+
+	/*
+	 * At least one group went from disabled to enabled. Now we need
+	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
+	 * If a pending IRQ's group is now enabled, add it to the ap_list.
+	 *
+	 * For each VCPU this needs to be atomic, as we need *all* newly
+	 * enabled IRQs in be in the ap_list to determine the highest
+	 * priority one.
+	 * So grab the ap_list_lock, then iterate over all private IRQs and
+	 * all SPIs. Once the ap_list is updated, kick that VCPU to
+	 * forward any new IRQs to the guest.
 	 */
+	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
+		unsigned long flags;
+		int i;
 
-	 /* For now just kick all VCPUs, as the old code did. */
-	vgic_kick_vcpus(kvm);
+		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
+
+		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
+			queue_enabled_irq(kvm, vcpu, i);
+
+		for (i = VGIC_NR_PRIVATE_IRQS;
+		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
+			queue_enabled_irq(kvm, vcpu, i);
+
+                raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock,
+                                           flags);
+
+		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
+			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+			kvm_vcpu_kick(vcpu);
+		}
+	}
 }
 
 bool vgic_dist_group_enabled(struct kvm *kvm, int group)