diff mbox

[RFC,2/4] KVM: arm/arm64: vgic: Support level-triggered mapped interrupts

Message ID 20170829093902.15379-3-cdall@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Christoffer Dall Aug. 29, 2017, 9:39 a.m. UTC
Level-triggered mapped IRQs are special because we only observe rising
edges as input to the VGIC, and we don't set the EOI flag and therefore
are not told when the level goes down, so that we can re-queue a new
interrupt when the level goes up.

One way to solve this problem is to side-step the logic of the VGIC and
special case the validation in the injection path, but it has the
unfortunate drawback of having to peak into the physical GIC state
whenever we want to know if the interrupt is pending on the virtual
distributor.

Instead, we can maintain the current semantics of a level triggered
interrupt by sort of treating it as an edge-triggered interrupt,
following from the fact that we only observe an asserting edge.  This
requires us to be a bit careful when populating the LRs and when folding
the state back in though:

 * We lower the line level when populating the LR, so that when
   subsequently observing an asserting edge, the VGIC will do the right
   thing.

 * If the guest never acked the interrupt while running (for example if
   it had masked interrupts at the CPU level while running), we have
   to preserve the pending state of the LR and move it back to the
   line_level field of the struct irq when folding LR state.

   If the guest never acked the interrupt while running, but changed the
   device state and lowered the line (again with interrupts masked) then
   we need to observe this change in the line_level.

   Both of the above situations are solved by sampling the physical line
   and set the line level when folding the LR back.

 * Finally, if the guest never acked the interrupt while running and
   sampling the line reveals that the device state has changed and the
   line has been lowered, we must clear the physical active state, since
   we will otherwise never be told when the interrupt becomes asserted
   again.

This has the added benefit of making the timer optimization patches
(https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
bit simpler, because the timer code doesn't have to clear the active
state on the sync anymore.  It also potentially improves the performance
of the timer implementation because the GIC knows the state or the LR
and only needs to clear the
active state when the pending bit in the LR is still set, where the
timer has to always clear it when returning from running the guest with
an injected timer interrupt.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.h    |  7 +++++++
 4 files changed, 88 insertions(+)

Comments

Eric Auger Aug. 30, 2017, 8:19 a.m. UTC | #1
Hi Christoffer,

On 29/08/2017 11:39, Christoffer Dall wrote:
> Level-triggered mapped IRQs are special because we only observe rising
> edges as input to the VGIC, and we don't set the EOI flag and therefore
> are not told when the level goes down, so that we can re-queue a new
> interrupt when the level goes up.
> 
> One way to solve this problem is to side-step the logic of the VGIC and
> special case the validation in the injection path, but it has the
> unfortunate drawback of having to peak into the physical GIC state
> whenever we want to know if the interrupt is pending on the virtual
> distributor.
> 
> Instead, we can maintain the current semantics of a level triggered
> interrupt by sort of treating it as an edge-triggered interrupt,
> following from the fact that we only observe an asserting edge.  This
> requires us to be a bit careful when populating the LRs and when folding
> the state back in though:
> 
>  * We lower the line level when populating the LR, so that when
>    subsequently observing an asserting edge, the VGIC will do the right
>    thing.
> 
>  * If the guest never acked the interrupt while running (for example if
>    it had masked interrupts at the CPU level while running), we have
>    to preserve the pending state of the LR and move it back to the
>    line_level field of the struct irq when folding LR state.
> 
>    If the guest never acked the interrupt while running, but changed the
>    device state and lowered the line (again with interrupts masked) then
>    we need to observe this change in the line_level.
> 
>    Both of the above situations are solved by sampling the physical line
>    and set the line level when folding the LR back.
> 
>  * Finally, if the guest never acked the interrupt while running and
>    sampling the line reveals that the device state has changed and the
>    line has been lowered, we must clear the physical active state, since
>    we will otherwise never be told when the interrupt becomes asserted
>    again.
> 
> This has the added benefit of making the timer optimization patches
> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> bit simpler, because the timer code doesn't have to clear the active
> state on the sync anymore.  It also potentially improves the performance
> of the timer implementation because the GIC knows the state or the LR
> and only needs to clear the
> active state when the pending bit in the LR is still set, where the
> timer has to always clear it when returning from running the guest with
> an injected timer interrupt.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
>  4 files changed, 88 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e4187e5..f7c5cb5 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> +		/*
> +		 * Level-triggered mapped IRQs are special because we only
> +		 * observe rising edges as input to the VGIC.
> +		 *
> +		 * If the guest never acked the interrupt we have to sample
> +		 * the physical line and set the line level, because the
> +		 * device state could have changed or we simply need to
> +		 * process the still pending interrupt later.
> +		 *
> +		 * If this causes us to lower the level, we have to also clear
> +		 * the physical active state, since we will otherwise never be
> +		 * told when the interrupt becomes asserted again.
> +		 */
> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> +			irq->line_level = vgic_irq_line_level(irq);
> +
> +			if (!irq->line_level)
> +				vgic_irq_clear_phys_active(irq);
> +		}
> +
>  		spin_unlock(&irq->irq_lock);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  			val |= GICH_LR_EOI;
>  	}
>  
> +	/*
> +	 * Level-triggered mapped IRQs are special because we only observe
> +	 * rising edges as input to the VGIC.  We therefore lower the line
> +	 * level here, so that we can take new virtual IRQs.  See
> +	 * vgic_v2_fold_lr_state for more info.
> +	 */
> +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
> +		irq->line_level = false;
> +
>  	/* The GICv2 LR only holds five bits of priority. */
>  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 96ea597..e377036 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> +		/*
> +		 * Level-triggered mapped IRQs are special because we only
> +		 * observe rising edges as input to the VGIC.
> +		 *
> +		 * If the guest never acked the interrupt we have to sample
> +		 * the physical line and set the line level, because the
> +		 * device state could have changed or we simply need to
> +		 * process the still pending interrupt later.
> +		 *
> +		 * If this causes us to lower the level, we have to also clear
> +		 * the physical active state, since we will otherwise never be
> +		 * told when the interrupt becomes asserted again.
> +		 */
> +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> +			irq->line_level = vgic_irq_line_level(irq);

I don't see anything related to the GICD_ISPENDING/ICPENDING. In the
last thread, we said we were obliged to set the pending bit on the
physical distributor to avoid the guest deactivating a non active
physical IRQ. If we still intend to do so, with that addition, aren't we
likely to deactivate the physical IRQ before the guest?

Thanks

Eric
> +
> +			if (!irq->line_level)
> +				vgic_irq_clear_phys_active(irq);
> +		}
> +
>  		spin_unlock(&irq->irq_lock);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -144,6 +164,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  	}
>  
>  	/*
> +	 * Level-triggered mapped IRQs are special because we only observe
> +	 * rising edges as input to the VGIC.  We therefore lower the line
> +	 * level here, so that we can take new virtual IRQs.  See
> +	 * vgic_v3_fold_lr_state for more info.
> +	 */
> +	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
> +		irq->line_level = false;
> +
> +	/*
>  	 * We currently only support Group1 interrupts, which is a
>  	 * known defect. This needs to be addressed at some point.
>  	 */
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 9d557efd..2691a0a 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -140,6 +140,29 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>  	kfree(irq);
>  }
>  
> +/* Get the input level of a mapped IRQ directly from the physical GIC */
> +bool vgic_irq_line_level(struct vgic_irq *irq)
> +{
> +	bool line_level;
> +
> +	BUG_ON(!irq->hw);
> +
> +	WARN_ON(irq_get_irqchip_state(irq->host_irq,
> +				      IRQCHIP_STATE_PENDING,
> +				      &line_level));
> +	return line_level;
> +}
> +
> +/* Clear the physical active state */
> +void vgic_irq_clear_phys_active(struct vgic_irq *irq)
> +{
> +
> +	BUG_ON(!irq->hw);
> +	WARN_ON(irq_set_irqchip_state(irq->host_irq,
> +				      IRQCHIP_STATE_ACTIVE,
> +				      false));
> +}
> +
>  /**
>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>   *
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bba7fa2..22f106d 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
>  		return irq->pending_latch || irq->line_level;
>  }
>  
> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> +{
> +	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> +}
> +
>  /*
>   * This struct provides an intermediate representation of the fields contained
>   * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
> @@ -140,6 +145,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> +bool vgic_irq_line_level(struct vgic_irq *irq);
> +void vgic_irq_clear_phys_active(struct vgic_irq *irq);
>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
>
Christoffer Dall Aug. 30, 2017, 9:20 a.m. UTC | #2
On Wed, Aug 30, 2017 at 10:19:38AM +0200, Auger Eric wrote:
> Hi Christoffer,
> 
> On 29/08/2017 11:39, Christoffer Dall wrote:
> > Level-triggered mapped IRQs are special because we only observe rising
> > edges as input to the VGIC, and we don't set the EOI flag and therefore
> > are not told when the level goes down, so that we can re-queue a new
> > interrupt when the level goes up.
> > 
> > One way to solve this problem is to side-step the logic of the VGIC and
> > special case the validation in the injection path, but it has the
> > unfortunate drawback of having to peak into the physical GIC state
> > whenever we want to know if the interrupt is pending on the virtual
> > distributor.
> > 
> > Instead, we can maintain the current semantics of a level triggered
> > interrupt by sort of treating it as an edge-triggered interrupt,
> > following from the fact that we only observe an asserting edge.  This
> > requires us to be a bit careful when populating the LRs and when folding
> > the state back in though:
> > 
> >  * We lower the line level when populating the LR, so that when
> >    subsequently observing an asserting edge, the VGIC will do the right
> >    thing.
> > 
> >  * If the guest never acked the interrupt while running (for example if
> >    it had masked interrupts at the CPU level while running), we have
> >    to preserve the pending state of the LR and move it back to the
> >    line_level field of the struct irq when folding LR state.
> > 
> >    If the guest never acked the interrupt while running, but changed the
> >    device state and lowered the line (again with interrupts masked) then
> >    we need to observe this change in the line_level.
> > 
> >    Both of the above situations are solved by sampling the physical line
> >    and set the line level when folding the LR back.
> > 
> >  * Finally, if the guest never acked the interrupt while running and
> >    sampling the line reveals that the device state has changed and the
> >    line has been lowered, we must clear the physical active state, since
> >    we will otherwise never be told when the interrupt becomes asserted
> >    again.
> > 
> > This has the added benefit of making the timer optimization patches
> > (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> > bit simpler, because the timer code doesn't have to clear the active
> > state on the sync anymore.  It also potentially improves the performance
> > of the timer implementation because the GIC knows the state or the LR
> > and only needs to clear the
> > active state when the pending bit in the LR is still set, where the
> > timer has to always clear it when returning from running the guest with
> > an injected timer interrupt.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
> >  4 files changed, 88 insertions(+)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> > index e4187e5..f7c5cb5 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > +		/*
> > +		 * Level-triggered mapped IRQs are special because we only
> > +		 * observe rising edges as input to the VGIC.
> > +		 *
> > +		 * If the guest never acked the interrupt we have to sample
> > +		 * the physical line and set the line level, because the
> > +		 * device state could have changed or we simply need to
> > +		 * process the still pending interrupt later.
> > +		 *
> > +		 * If this causes us to lower the level, we have to also clear
> > +		 * the physical active state, since we will otherwise never be
> > +		 * told when the interrupt becomes asserted again.
> > +		 */
> > +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> > +			irq->line_level = vgic_irq_line_level(irq);
> > +
> > +			if (!irq->line_level)
> > +				vgic_irq_clear_phys_active(irq);
> > +		}
> > +
> >  		spin_unlock(&irq->irq_lock);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
> >  			val |= GICH_LR_EOI;
> >  	}
> >  
> > +	/*
> > +	 * Level-triggered mapped IRQs are special because we only observe
> > +	 * rising edges as input to the VGIC.  We therefore lower the line
> > +	 * level here, so that we can take new virtual IRQs.  See
> > +	 * vgic_v2_fold_lr_state for more info.
> > +	 */
> > +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
> > +		irq->line_level = false;
> > +
> >  	/* The GICv2 LR only holds five bits of priority. */
> >  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> > index 96ea597..e377036 100644
> > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > +		/*
> > +		 * Level-triggered mapped IRQs are special because we only
> > +		 * observe rising edges as input to the VGIC.
> > +		 *
> > +		 * If the guest never acked the interrupt we have to sample
> > +		 * the physical line and set the line level, because the
> > +		 * device state could have changed or we simply need to
> > +		 * process the still pending interrupt later.
> > +		 *
> > +		 * If this causes us to lower the level, we have to also clear
> > +		 * the physical active state, since we will otherwise never be
> > +		 * told when the interrupt becomes asserted again.
> > +		 */
> > +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> > +			irq->line_level = vgic_irq_line_level(irq);
> 
> I don't see anything related to the GICD_ISPENDING/ICPENDING.

vgic_irq_line_level() reads GICD_ISPEND.  Not sure what you mean?

> In the
> last thread, we said we were obliged to set the pending bit on the
> physical distributor to avoid the guest deactivating a non active
> physical IRQ.

Did we?  I don't remember this, and this doesn't make sense to me.  The
only constraint I think we have is that when setting the HW bit in the
LR, the interrupt must be active on the physical side, so that a
deactivate by the guest is also a deactivate on the physical side.

> If we still intend to do so, with that addition, aren't we
> likely to deactivate the physical IRQ before the guest?

Do you mean if it's pending+active in the LR?  That can never happen for
a mapped (HW bit set) interrupt.

So this particular code sequence means that
    (val & ICH_LR_STATE) == ICH_LR_PENDING_BIT

and we never clear the physical active state while the virtual interrupt
is active.

There is an interesting point in terms of handling guest accesses to the
virtual distributor GICD_ISPENDR, GICD_ICPENDR, GICD_ISACTIVER, and
GICD_ICACTIVER where we must also propogate those changes to the
physical side.

Was this what you meant?

Thanks,
-Christoffer


> > +
> > +			if (!irq->line_level)
> > +				vgic_irq_clear_phys_active(irq);
> > +		}
> > +
> >  		spin_unlock(&irq->irq_lock);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -144,6 +164,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
> >  	}
> >  
> >  	/*
> > +	 * Level-triggered mapped IRQs are special because we only observe
> > +	 * rising edges as input to the VGIC.  We therefore lower the line
> > +	 * level here, so that we can take new virtual IRQs.  See
> > +	 * vgic_v3_fold_lr_state for more info.
> > +	 */
> > +	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
> > +		irq->line_level = false;
> > +
> > +	/*
> >  	 * We currently only support Group1 interrupts, which is a
> >  	 * known defect. This needs to be addressed at some point.
> >  	 */
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 9d557efd..2691a0a 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -140,6 +140,29 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >  	kfree(irq);
> >  }
> >  
> > +/* Get the input level of a mapped IRQ directly from the physical GIC */
> > +bool vgic_irq_line_level(struct vgic_irq *irq)
> > +{
> > +	bool line_level;
> > +
> > +	BUG_ON(!irq->hw);
> > +
> > +	WARN_ON(irq_get_irqchip_state(irq->host_irq,
> > +				      IRQCHIP_STATE_PENDING,
> > +				      &line_level));
> > +	return line_level;
> > +}
> > +
> > +/* Clear the physical active state */
> > +void vgic_irq_clear_phys_active(struct vgic_irq *irq)
> > +{
> > +
> > +	BUG_ON(!irq->hw);
> > +	WARN_ON(irq_set_irqchip_state(irq->host_irq,
> > +				      IRQCHIP_STATE_ACTIVE,
> > +				      false));
> > +}
> > +
> >  /**
> >   * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >   *
> > diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> > index bba7fa2..22f106d 100644
> > --- a/virt/kvm/arm/vgic/vgic.h
> > +++ b/virt/kvm/arm/vgic/vgic.h
> > @@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
> >  		return irq->pending_latch || irq->line_level;
> >  }
> >  
> > +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> > +{
> > +	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> > +}
> > +
> >  /*
> >   * This struct provides an intermediate representation of the fields contained
> >   * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
> > @@ -140,6 +145,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
> >  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> >  			      u32 intid);
> >  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> > +bool vgic_irq_line_level(struct vgic_irq *irq);
> > +void vgic_irq_clear_phys_active(struct vgic_irq *irq);
> >  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> >  void vgic_kick_vcpus(struct kvm *kvm);
> >  
> >
Eric Auger Aug. 30, 2017, 10:13 a.m. UTC | #3
Hi Christoffer,

On 30/08/2017 11:20, Christoffer Dall wrote:
> On Wed, Aug 30, 2017 at 10:19:38AM +0200, Auger Eric wrote:
>> Hi Christoffer,
>>
>> On 29/08/2017 11:39, Christoffer Dall wrote:
>>> Level-triggered mapped IRQs are special because we only observe rising
>>> edges as input to the VGIC, and we don't set the EOI flag and therefore
>>> are not told when the level goes down, so that we can re-queue a new
>>> interrupt when the level goes up.
>>>
>>> One way to solve this problem is to side-step the logic of the VGIC and
>>> special case the validation in the injection path, but it has the
>>> unfortunate drawback of having to peak into the physical GIC state
>>> whenever we want to know if the interrupt is pending on the virtual
>>> distributor.
>>>
>>> Instead, we can maintain the current semantics of a level triggered
>>> interrupt by sort of treating it as an edge-triggered interrupt,
>>> following from the fact that we only observe an asserting edge.  This
>>> requires us to be a bit careful when populating the LRs and when folding
>>> the state back in though:
>>>
>>>  * We lower the line level when populating the LR, so that when
>>>    subsequently observing an asserting edge, the VGIC will do the right
>>>    thing.
>>>
>>>  * If the guest never acked the interrupt while running (for example if
>>>    it had masked interrupts at the CPU level while running), we have
>>>    to preserve the pending state of the LR and move it back to the
>>>    line_level field of the struct irq when folding LR state.
>>>
>>>    If the guest never acked the interrupt while running, but changed the
>>>    device state and lowered the line (again with interrupts masked) then
>>>    we need to observe this change in the line_level.
>>>
>>>    Both of the above situations are solved by sampling the physical line
>>>    and set the line level when folding the LR back.
>>>
>>>  * Finally, if the guest never acked the interrupt while running and
>>>    sampling the line reveals that the device state has changed and the
>>>    line has been lowered, we must clear the physical active state, since
>>>    we will otherwise never be told when the interrupt becomes asserted
>>>    again.
>>>
>>> This has the added benefit of making the timer optimization patches
>>> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
>>> bit simpler, because the timer code doesn't have to clear the active
>>> state on the sync anymore.  It also potentially improves the performance
>>> of the timer implementation because the GIC knows the state or the LR
>>> and only needs to clear the
>>> active state when the pending bit in the LR is still set, where the
>>> timer has to always clear it when returning from running the guest with
>>> an injected timer interrupt.
>>>
>>> Signed-off-by: Christoffer Dall <cdall@linaro.org>
>>> ---
>>>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
>>>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
>>>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
>>>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
>>>  4 files changed, 88 insertions(+)
>>>
>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>>> index e4187e5..f7c5cb5 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>  				irq->pending_latch = false;
>>>  		}
>>>  
>>> +		/*
>>> +		 * Level-triggered mapped IRQs are special because we only
>>> +		 * observe rising edges as input to the VGIC.
>>> +		 *
>>> +		 * If the guest never acked the interrupt we have to sample
>>> +		 * the physical line and set the line level, because the
>>> +		 * device state could have changed or we simply need to
>>> +		 * process the still pending interrupt later.
>>> +		 *
>>> +		 * If this causes us to lower the level, we have to also clear
>>> +		 * the physical active state, since we will otherwise never be
>>> +		 * told when the interrupt becomes asserted again.
>>> +		 */
>>> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
>>> +			irq->line_level = vgic_irq_line_level(irq);
>>> +
>>> +			if (!irq->line_level)
>>> +				vgic_irq_clear_phys_active(irq);
>>> +		}
>>> +
>>>  		spin_unlock(&irq->irq_lock);
>>>  		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>> @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>>>  			val |= GICH_LR_EOI;
>>>  	}
>>>  
>>> +	/*
>>> +	 * Level-triggered mapped IRQs are special because we only observe
>>> +	 * rising edges as input to the VGIC.  We therefore lower the line
>>> +	 * level here, so that we can take new virtual IRQs.  See
>>> +	 * vgic_v2_fold_lr_state for more info.
>>> +	 */
>>> +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
>>> +		irq->line_level = false;
>>> +
>>>  	/* The GICv2 LR only holds five bits of priority. */
>>>  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>> index 96ea597..e377036 100644
>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>> @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>  				irq->pending_latch = false;
>>>  		}
>>>  
>>> +		/*
>>> +		 * Level-triggered mapped IRQs are special because we only
>>> +		 * observe rising edges as input to the VGIC.
>>> +		 *
>>> +		 * If the guest never acked the interrupt we have to sample
>>> +		 * the physical line and set the line level, because the
>>> +		 * device state could have changed or we simply need to
>>> +		 * process the still pending interrupt later.
>>> +		 *
>>> +		 * If this causes us to lower the level, we have to also clear
>>> +		 * the physical active state, since we will otherwise never be
>>> +		 * told when the interrupt becomes asserted again.
>>> +		 */
>>> +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
>>> +			irq->line_level = vgic_irq_line_level(irq);
>>
>> I don't see anything related to the GICD_ISPENDING/ICPENDING.
> 
> vgic_irq_line_level() reads GICD_ISPEND.  Not sure what you mean?
> 
>> In the
>> last thread, we said we were obliged to set the pending bit on the
>> physical distributor to avoid the guest deactivating a non active
>> physical IRQ.
> 
> Did we?  I don't remember this, and this doesn't make sense to me.  The
> only constraint I think we have is that when setting the HW bit in the
> LR, the interrupt must be active on the physical side, so that a
> deactivate by the guest is also a deactivate on the physical side.

That what I understood from this thread:
https://lkml.org/lkml/2017/7/25/534

At the moment the LR HW bit is set whatever the source of the IRQ I
think, HW driven or ISPENDRn driven (we only test irq->hw)


> 
>> If we still intend to do so, with that addition, aren't we
>> likely to deactivate the physical IRQ before the guest?
> 
> Do you mean if it's pending+active in the LR?  That can never happen for
> a mapped (HW bit set) interrupt.
> 
> So this particular code sequence means that
>     (val & ICH_LR_STATE) == ICH_LR_PENDING_BIT
> 
> and we never clear the physical active state while the virtual interrupt
> is active.

No my scenario was:
IRQ originates from a ISPENDRn. I Thought we set the physical
distributor state accordingly. pIRQ is active. vIRQ is pending. On fold
we detect the line is low and we deactivate the pIRQ. pending_latch is
still true. The vIRQ will be processed but its physical IRQ has been
deactivated.

> There is an interesting point in terms of handling guest accesses to the
> virtual distributor GICD_ISPENDR, GICD_ICPENDR, GICD_ISACTIVER, and
> GICD_ICACTIVER where we must also propogate those changes to the
> physical side.
> 
> Was this what you meant?
yes I think we also need to address this to get the whole picture.

Thanks

Eric
> 
> Thanks,
> -Christoffer
> 
> 
>>> +
>>> +			if (!irq->line_level)
>>> +				vgic_irq_clear_phys_active(irq);
>>> +		}
>>> +
>>>  		spin_unlock(&irq->irq_lock);
>>>  		vgic_put_irq(vcpu->kvm, irq);
>>>  	}
>>> @@ -144,6 +164,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>>>  	}
>>>  
>>>  	/*
>>> +	 * Level-triggered mapped IRQs are special because we only observe
>>> +	 * rising edges as input to the VGIC.  We therefore lower the line
>>> +	 * level here, so that we can take new virtual IRQs.  See
>>> +	 * vgic_v3_fold_lr_state for more info.
>>> +	 */
>>> +	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
>>> +		irq->line_level = false;
>>> +
>>> +	/*
>>>  	 * We currently only support Group1 interrupts, which is a
>>>  	 * known defect. This needs to be addressed at some point.
>>>  	 */
>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>> index 9d557efd..2691a0a 100644
>>> --- a/virt/kvm/arm/vgic/vgic.c
>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>> @@ -140,6 +140,29 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>  	kfree(irq);
>>>  }
>>>  
>>> +/* Get the input level of a mapped IRQ directly from the physical GIC */
>>> +bool vgic_irq_line_level(struct vgic_irq *irq)
>>> +{
>>> +	bool line_level;
>>> +
>>> +	BUG_ON(!irq->hw);
>>> +
>>> +	WARN_ON(irq_get_irqchip_state(irq->host_irq,
>>> +				      IRQCHIP_STATE_PENDING,
>>> +				      &line_level));
>>> +	return line_level;
>>> +}
>>> +
>>> +/* Clear the physical active state */
>>> +void vgic_irq_clear_phys_active(struct vgic_irq *irq)
>>> +{
>>> +
>>> +	BUG_ON(!irq->hw);
>>> +	WARN_ON(irq_set_irqchip_state(irq->host_irq,
>>> +				      IRQCHIP_STATE_ACTIVE,
>>> +				      false));
>>> +}
>>> +
>>>  /**
>>>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>   *
>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>>> index bba7fa2..22f106d 100644
>>> --- a/virt/kvm/arm/vgic/vgic.h
>>> +++ b/virt/kvm/arm/vgic/vgic.h
>>> @@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
>>>  		return irq->pending_latch || irq->line_level;
>>>  }
>>>  
>>> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
>>> +{
>>> +	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
>>> +}
>>> +
>>>  /*
>>>   * This struct provides an intermediate representation of the fields contained
>>>   * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
>>> @@ -140,6 +145,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
>>>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>>>  			      u32 intid);
>>>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
>>> +bool vgic_irq_line_level(struct vgic_irq *irq);
>>> +void vgic_irq_clear_phys_active(struct vgic_irq *irq);
>>>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>>>  void vgic_kick_vcpus(struct kvm *kvm);
>>>  
>>>
Christoffer Dall Aug. 30, 2017, 12:03 p.m. UTC | #4
On Wed, Aug 30, 2017 at 12:13:32PM +0200, Auger Eric wrote:
> Hi Christoffer,
> 
> On 30/08/2017 11:20, Christoffer Dall wrote:
> > On Wed, Aug 30, 2017 at 10:19:38AM +0200, Auger Eric wrote:
> >> Hi Christoffer,
> >>
> >> On 29/08/2017 11:39, Christoffer Dall wrote:
> >>> Level-triggered mapped IRQs are special because we only observe rising
> >>> edges as input to the VGIC, and we don't set the EOI flag and therefore
> >>> are not told when the level goes down, so that we can re-queue a new
> >>> interrupt when the level goes up.
> >>>
> >>> One way to solve this problem is to side-step the logic of the VGIC and
> >>> special case the validation in the injection path, but it has the
> >>> unfortunate drawback of having to peak into the physical GIC state
> >>> whenever we want to know if the interrupt is pending on the virtual
> >>> distributor.
> >>>
> >>> Instead, we can maintain the current semantics of a level triggered
> >>> interrupt by sort of treating it as an edge-triggered interrupt,
> >>> following from the fact that we only observe an asserting edge.  This
> >>> requires us to be a bit careful when populating the LRs and when folding
> >>> the state back in though:
> >>>
> >>>  * We lower the line level when populating the LR, so that when
> >>>    subsequently observing an asserting edge, the VGIC will do the right
> >>>    thing.
> >>>
> >>>  * If the guest never acked the interrupt while running (for example if
> >>>    it had masked interrupts at the CPU level while running), we have
> >>>    to preserve the pending state of the LR and move it back to the
> >>>    line_level field of the struct irq when folding LR state.
> >>>
> >>>    If the guest never acked the interrupt while running, but changed the
> >>>    device state and lowered the line (again with interrupts masked) then
> >>>    we need to observe this change in the line_level.
> >>>
> >>>    Both of the above situations are solved by sampling the physical line
> >>>    and set the line level when folding the LR back.
> >>>
> >>>  * Finally, if the guest never acked the interrupt while running and
> >>>    sampling the line reveals that the device state has changed and the
> >>>    line has been lowered, we must clear the physical active state, since
> >>>    we will otherwise never be told when the interrupt becomes asserted
> >>>    again.
> >>>
> >>> This has the added benefit of making the timer optimization patches
> >>> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> >>> bit simpler, because the timer code doesn't have to clear the active
> >>> state on the sync anymore.  It also potentially improves the performance
> >>> of the timer implementation because the GIC knows the state or the LR
> >>> and only needs to clear the
> >>> active state when the pending bit in the LR is still set, where the
> >>> timer has to always clear it when returning from running the guest with
> >>> an injected timer interrupt.
> >>>
> >>> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> >>> ---
> >>>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
> >>>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
> >>>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
> >>>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
> >>>  4 files changed, 88 insertions(+)
> >>>
> >>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> >>> index e4187e5..f7c5cb5 100644
> >>> --- a/virt/kvm/arm/vgic/vgic-v2.c
> >>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> >>> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >>>  				irq->pending_latch = false;
> >>>  		}
> >>>  
> >>> +		/*
> >>> +		 * Level-triggered mapped IRQs are special because we only
> >>> +		 * observe rising edges as input to the VGIC.
> >>> +		 *
> >>> +		 * If the guest never acked the interrupt we have to sample
> >>> +		 * the physical line and set the line level, because the
> >>> +		 * device state could have changed or we simply need to
> >>> +		 * process the still pending interrupt later.
> >>> +		 *
> >>> +		 * If this causes us to lower the level, we have to also clear
> >>> +		 * the physical active state, since we will otherwise never be
> >>> +		 * told when the interrupt becomes asserted again.
> >>> +		 */
> >>> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> >>> +			irq->line_level = vgic_irq_line_level(irq);
> >>> +
> >>> +			if (!irq->line_level)
> >>> +				vgic_irq_clear_phys_active(irq);
> >>> +		}
> >>> +
> >>>  		spin_unlock(&irq->irq_lock);
> >>>  		vgic_put_irq(vcpu->kvm, irq);
> >>>  	}
> >>> @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
> >>>  			val |= GICH_LR_EOI;
> >>>  	}
> >>>  
> >>> +	/*
> >>> +	 * Level-triggered mapped IRQs are special because we only observe
> >>> +	 * rising edges as input to the VGIC.  We therefore lower the line
> >>> +	 * level here, so that we can take new virtual IRQs.  See
> >>> +	 * vgic_v2_fold_lr_state for more info.
> >>> +	 */
> >>> +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
> >>> +		irq->line_level = false;
> >>> +
> >>>  	/* The GICv2 LR only holds five bits of priority. */
> >>>  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
> >>>  
> >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> >>> index 96ea597..e377036 100644
> >>> --- a/virt/kvm/arm/vgic/vgic-v3.c
> >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> >>> @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >>>  				irq->pending_latch = false;
> >>>  		}
> >>>  
> >>> +		/*
> >>> +		 * Level-triggered mapped IRQs are special because we only
> >>> +		 * observe rising edges as input to the VGIC.
> >>> +		 *
> >>> +		 * If the guest never acked the interrupt we have to sample
> >>> +		 * the physical line and set the line level, because the
> >>> +		 * device state could have changed or we simply need to
> >>> +		 * process the still pending interrupt later.
> >>> +		 *
> >>> +		 * If this causes us to lower the level, we have to also clear
> >>> +		 * the physical active state, since we will otherwise never be
> >>> +		 * told when the interrupt becomes asserted again.
> >>> +		 */
> >>> +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> >>> +			irq->line_level = vgic_irq_line_level(irq);
> >>
> >> I don't see anything related to the GICD_ISPENDING/ICPENDING.
> > 
> > vgic_irq_line_level() reads GICD_ISPEND.  Not sure what you mean?
> > 
> >> In the
> >> last thread, we said we were obliged to set the pending bit on the
> >> physical distributor to avoid the guest deactivating a non active
> >> physical IRQ.
> > 
> > Did we?  I don't remember this, and this doesn't make sense to me.  The
> > only constraint I think we have is that when setting the HW bit in the
> > LR, the interrupt must be active on the physical side, so that a
> > deactivate by the guest is also a deactivate on the physical side.
> 
> That what I understood from this thread:
> https://lkml.org/lkml/2017/7/25/534

Right, but that was specific to how we handle the guest writing to
PEND/ACT on the virtual distributor.

> 
> At the moment the LR HW bit is set whatever the source of the IRQ I
> think, HW driven or ISPENDRn driven (we only test irq->hw)
> 

Ah, I didn't actually consider that we could special-case the
pending_latch for a HW interrupt.  I suppose we should actually consider
that if someone tries to map an edge-triggered SPI.  Let me have a look
at that.

> 
> > 
> >> If we still intend to do so, with that addition, aren't we
> >> likely to deactivate the physical IRQ before the guest?
> > 
> > Do you mean if it's pending+active in the LR?  That can never happen for
> > a mapped (HW bit set) interrupt.
> > 
> > So this particular code sequence means that
> >     (val & ICH_LR_STATE) == ICH_LR_PENDING_BIT
> > 
> > and we never clear the physical active state while the virtual interrupt
> > is active.
> 
> No my scenario was:
> IRQ originates from a ISPENDRn.

virtual IRQ from a write to the VGIC ISPENDRn, correct?

> I Thought we set the physical
> distributor state accordingly. pIRQ is active. vIRQ is pending. On fold
> we detect the line is low and we deactivate the pIRQ. pending_latch is
> still true. The vIRQ will be processed but its physical IRQ has been
> deactivated.
> 

As long as we properly handle pending_latch and guest writes to the VGIC
PEND/ACT registers, I don't see a problem with this code.

> > There is an interesting point in terms of handling guest accesses to the
> > virtual distributor GICD_ISPENDR, GICD_ICPENDR, GICD_ISACTIVER, and
> > GICD_ICACTIVER where we must also propogate those changes to the
> > physical side.
> > 
> > Was this what you meant?
> yes I think we also need to address this to get the whole picture.
> 

Yes, I'll respin.

Thanks,
-Christoffer
Eric Auger Aug. 30, 2017, 12:57 p.m. UTC | #5
Hi Christoffer,

On 30/08/2017 14:03, Christoffer Dall wrote:
> On Wed, Aug 30, 2017 at 12:13:32PM +0200, Auger Eric wrote:
>> Hi Christoffer,
>>
>> On 30/08/2017 11:20, Christoffer Dall wrote:
>>> On Wed, Aug 30, 2017 at 10:19:38AM +0200, Auger Eric wrote:
>>>> Hi Christoffer,
>>>>
>>>> On 29/08/2017 11:39, Christoffer Dall wrote:
>>>>> Level-triggered mapped IRQs are special because we only observe rising
>>>>> edges as input to the VGIC, and we don't set the EOI flag and therefore
>>>>> are not told when the level goes down, so that we can re-queue a new
>>>>> interrupt when the level goes up.
>>>>>
>>>>> One way to solve this problem is to side-step the logic of the VGIC and
>>>>> special case the validation in the injection path, but it has the
>>>>> unfortunate drawback of having to peak into the physical GIC state
>>>>> whenever we want to know if the interrupt is pending on the virtual
>>>>> distributor.
>>>>>
>>>>> Instead, we can maintain the current semantics of a level triggered
>>>>> interrupt by sort of treating it as an edge-triggered interrupt,
>>>>> following from the fact that we only observe an asserting edge.  This
>>>>> requires us to be a bit careful when populating the LRs and when folding
>>>>> the state back in though:
>>>>>
>>>>>  * We lower the line level when populating the LR, so that when
>>>>>    subsequently observing an asserting edge, the VGIC will do the right
>>>>>    thing.
>>>>>
>>>>>  * If the guest never acked the interrupt while running (for example if
>>>>>    it had masked interrupts at the CPU level while running), we have
>>>>>    to preserve the pending state of the LR and move it back to the
>>>>>    line_level field of the struct irq when folding LR state.
>>>>>
>>>>>    If the guest never acked the interrupt while running, but changed the
>>>>>    device state and lowered the line (again with interrupts masked) then
>>>>>    we need to observe this change in the line_level.
>>>>>
>>>>>    Both of the above situations are solved by sampling the physical line
>>>>>    and set the line level when folding the LR back.
>>>>>
>>>>>  * Finally, if the guest never acked the interrupt while running and
>>>>>    sampling the line reveals that the device state has changed and the
>>>>>    line has been lowered, we must clear the physical active state, since
>>>>>    we will otherwise never be told when the interrupt becomes asserted
>>>>>    again.
>>>>>
>>>>> This has the added benefit of making the timer optimization patches
>>>>> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
>>>>> bit simpler, because the timer code doesn't have to clear the active
>>>>> state on the sync anymore.  It also potentially improves the performance
>>>>> of the timer implementation because the GIC knows the state or the LR
>>>>> and only needs to clear the
>>>>> active state when the pending bit in the LR is still set, where the
>>>>> timer has to always clear it when returning from running the guest with
>>>>> an injected timer interrupt.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <cdall@linaro.org>
>>>>> ---
>>>>>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
>>>>>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
>>>>>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
>>>>>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
>>>>>  4 files changed, 88 insertions(+)
>>>>>
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
>>>>> index e4187e5..f7c5cb5 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-v2.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-v2.c
>>>>> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>>  				irq->pending_latch = false;
>>>>>  		}
>>>>>  
>>>>> +		/*
>>>>> +		 * Level-triggered mapped IRQs are special because we only
>>>>> +		 * observe rising edges as input to the VGIC.
>>>>> +		 *
>>>>> +		 * If the guest never acked the interrupt we have to sample
>>>>> +		 * the physical line and set the line level, because the
>>>>> +		 * device state could have changed or we simply need to
>>>>> +		 * process the still pending interrupt later.
>>>>> +		 *
>>>>> +		 * If this causes us to lower the level, we have to also clear
>>>>> +		 * the physical active state, since we will otherwise never be
>>>>> +		 * told when the interrupt becomes asserted again.
>>>>> +		 */
>>>>> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
>>>>> +			irq->line_level = vgic_irq_line_level(irq);
>>>>> +
>>>>> +			if (!irq->line_level)
>>>>> +				vgic_irq_clear_phys_active(irq);
>>>>> +		}
>>>>> +
>>>>>  		spin_unlock(&irq->irq_lock);
>>>>>  		vgic_put_irq(vcpu->kvm, irq);
>>>>>  	}
>>>>> @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>>>>>  			val |= GICH_LR_EOI;
>>>>>  	}
>>>>>  
>>>>> +	/*
>>>>> +	 * Level-triggered mapped IRQs are special because we only observe
>>>>> +	 * rising edges as input to the VGIC.  We therefore lower the line
>>>>> +	 * level here, so that we can take new virtual IRQs.  See
>>>>> +	 * vgic_v2_fold_lr_state for more info.
>>>>> +	 */
>>>>> +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
>>>>> +		irq->line_level = false;
>>>>> +
>>>>>  	/* The GICv2 LR only holds five bits of priority. */
>>>>>  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
>>>>>  
>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>>>> index 96ea597..e377036 100644
>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>>>> @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>>>>>  				irq->pending_latch = false;
>>>>>  		}
>>>>>  
>>>>> +		/*
>>>>> +		 * Level-triggered mapped IRQs are special because we only
>>>>> +		 * observe rising edges as input to the VGIC.
>>>>> +		 *
>>>>> +		 * If the guest never acked the interrupt we have to sample
>>>>> +		 * the physical line and set the line level, because the
>>>>> +		 * device state could have changed or we simply need to
>>>>> +		 * process the still pending interrupt later.
>>>>> +		 *
>>>>> +		 * If this causes us to lower the level, we have to also clear
>>>>> +		 * the physical active state, since we will otherwise never be
>>>>> +		 * told when the interrupt becomes asserted again.
>>>>> +		 */
>>>>> +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
>>>>> +			irq->line_level = vgic_irq_line_level(irq);
>>>>
>>>> I don't see anything related to the GICD_ISPENDING/ICPENDING.
>>>
>>> vgic_irq_line_level() reads GICD_ISPEND.  Not sure what you mean?
>>>
>>>> In the
>>>> last thread, we said we were obliged to set the pending bit on the
>>>> physical distributor to avoid the guest deactivating a non active
>>>> physical IRQ.
>>>
>>> Did we?  I don't remember this, and this doesn't make sense to me.  The
>>> only constraint I think we have is that when setting the HW bit in the
>>> LR, the interrupt must be active on the physical side, so that a
>>> deactivate by the guest is also a deactivate on the physical side.
>>
>> That what I understood from this thread:
>> https://lkml.org/lkml/2017/7/25/534
> 
> Right, but that was specific to how we handle the guest writing to
> PEND/ACT on the virtual distributor.
> 
>>
>> At the moment the LR HW bit is set whatever the source of the IRQ I
>> think, HW driven or ISPENDRn driven (we only test irq->hw)
>>
> 
> Ah, I didn't actually consider that we could special-case the
> pending_latch for a HW interrupt.  I suppose we should actually consider
> that if someone tries to map an edge-triggered SPI.  Let me have a look
> at that.
> 
>>
>>>
>>>> If we still intend to do so, with that addition, aren't we
>>>> likely to deactivate the physical IRQ before the guest?
>>>
>>> Do you mean if it's pending+active in the LR?  That can never happen for
>>> a mapped (HW bit set) interrupt.
>>>
>>> So this particular code sequence means that
>>>     (val & ICH_LR_STATE) == ICH_LR_PENDING_BIT
>>>
>>> and we never clear the physical active state while the virtual interrupt
>>> is active.
>>
>> No my scenario was:
>> IRQ originates from a ISPENDRn.
> 
> virtual IRQ from a write to the VGIC ISPENDRn, correct?
correct.
> 
>> I Thought we set the physical
>> distributor state accordingly. pIRQ is active. vIRQ is pending. On fold
>> we detect the line is low and we deactivate the pIRQ. pending_latch is
>> still true. The vIRQ will be processed but its physical IRQ has been
>> deactivated.
>>
> 
> As long as we properly handle pending_latch and guest writes to the VGIC
> PEND/ACT registers, I don't see a problem with this code.
to me this depends if the LR HW bit is set.
> 
>>> There is an interesting point in terms of handling guest accesses to the
>>> virtual distributor GICD_ISPENDR, GICD_ICPENDR, GICD_ISACTIVER, and
>>> GICD_ICACTIVER where we must also propogate those changes to the
>>> physical side.
>>>
>>> Was this what you meant?
>> yes I think we also need to address this to get the whole picture.
>>
> 
> Yes, I'll respin.
OK

Thanks

Eric
> 
> Thanks,
> -Christoffer
>
Marc Zyngier Sept. 2, 2017, 10:52 a.m. UTC | #6
On Tue, Aug 29 2017 at 11:39:00 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Level-triggered mapped IRQs are special because we only observe rising
> edges as input to the VGIC, and we don't set the EOI flag and therefore
> are not told when the level goes down, so that we can re-queue a new
> interrupt when the level goes up.
>
> One way to solve this problem is to side-step the logic of the VGIC and
> special case the validation in the injection path, but it has the
> unfortunate drawback of having to peak into the physical GIC state
> whenever we want to know if the interrupt is pending on the virtual
> distributor.
>
> Instead, we can maintain the current semantics of a level triggered
> interrupt by sort of treating it as an edge-triggered interrupt,
> following from the fact that we only observe an asserting edge.  This

Would it be worth mentioning that this is a consequence of offloading
part of the flow handling to the HW?

> requires us to be a bit careful when populating the LRs and when folding
> the state back in though:
>
>  * We lower the line level when populating the LR, so that when
>    subsequently observing an asserting edge, the VGIC will do the right
>    thing.
>
>  * If the guest never acked the interrupt while running (for example if
>    it had masked interrupts at the CPU level while running), we have
>    to preserve the pending state of the LR and move it back to the
>    line_level field of the struct irq when folding LR state.
>
>    If the guest never acked the interrupt while running, but changed the
>    device state and lowered the line (again with interrupts masked) then
>    we need to observe this change in the line_level.
>
>    Both of the above situations are solved by sampling the physical line
>    and set the line level when folding the LR back.
>
>  * Finally, if the guest never acked the interrupt while running and
>    sampling the line reveals that the device state has changed and the
>    line has been lowered, we must clear the physical active state, since
>    we will otherwise never be told when the interrupt becomes asserted
>    again.
>
> This has the added benefit of making the timer optimization patches
> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> bit simpler, because the timer code doesn't have to clear the active
> state on the sync anymore.  It also potentially improves the performance
> of the timer implementation because the GIC knows the state or the LR
> and only needs to clear the
> active state when the pending bit in the LR is still set, where the
> timer has to always clear it when returning from running the guest with
> an injected timer interrupt.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
>  4 files changed, 88 insertions(+)
>
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e4187e5..f7c5cb5 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> +		/*
> +		 * Level-triggered mapped IRQs are special because we only
> +		 * observe rising edges as input to the VGIC.
> +		 *
> +		 * If the guest never acked the interrupt we have to sample
> +		 * the physical line and set the line level, because the
> +		 * device state could have changed or we simply need to
> +		 * process the still pending interrupt later.
> +		 *
> +		 * If this causes us to lower the level, we have to also clear
> +		 * the physical active state, since we will otherwise never be
> +		 * told when the interrupt becomes asserted again.
> +		 */
> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> +			irq->line_level = vgic_irq_line_level(irq);
> +
> +			if (!irq->line_level)
> +				vgic_irq_clear_phys_active(irq);
> +		}
> +
>  		spin_unlock(&irq->irq_lock);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  			val |= GICH_LR_EOI;
>  	}
>  
> +	/*
> +	 * Level-triggered mapped IRQs are special because we only observe
> +	 * rising edges as input to the VGIC.  We therefore lower the line
> +	 * level here, so that we can take new virtual IRQs.  See
> +	 * vgic_v2_fold_lr_state for more info.
> +	 */
> +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
> +		irq->line_level = false;
> +
>  	/* The GICv2 LR only holds five bits of priority. */
>  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 96ea597..e377036 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> +		/*
> +		 * Level-triggered mapped IRQs are special because we only
> +		 * observe rising edges as input to the VGIC.
> +		 *
> +		 * If the guest never acked the interrupt we have to sample
> +		 * the physical line and set the line level, because the
> +		 * device state could have changed or we simply need to
> +		 * process the still pending interrupt later.
> +		 *
> +		 * If this causes us to lower the level, we have to also clear
> +		 * the physical active state, since we will otherwise never be
> +		 * told when the interrupt becomes asserted again.
> +		 */
> +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> +			irq->line_level = vgic_irq_line_level(irq);
> +
> +			if (!irq->line_level)
> +				vgic_irq_clear_phys_active(irq);
> +		}
> +
>  		spin_unlock(&irq->irq_lock);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -144,6 +164,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  	}
>  
>  	/*
> +	 * Level-triggered mapped IRQs are special because we only observe
> +	 * rising edges as input to the VGIC.  We therefore lower the line
> +	 * level here, so that we can take new virtual IRQs.  See
> +	 * vgic_v3_fold_lr_state for more info.
> +	 */
> +	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
> +		irq->line_level = false;
> +
> +	/*
>  	 * We currently only support Group1 interrupts, which is a
>  	 * known defect. This needs to be addressed at some point.
>  	 */
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 9d557efd..2691a0a 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -140,6 +140,29 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>  	kfree(irq);
>  }
>  
> +/* Get the input level of a mapped IRQ directly from the physical GIC */
> +bool vgic_irq_line_level(struct vgic_irq *irq)

nit: it would make the above clearer if this was named something like
vgic_irq_get_phys_line_level(), as the current name is pretty ambiguous
about which side of the GIC we're looking at.

> +{
> +	bool line_level;
> +
> +	BUG_ON(!irq->hw);
> +
> +	WARN_ON(irq_get_irqchip_state(irq->host_irq,
> +				      IRQCHIP_STATE_PENDING,
> +				      &line_level));
> +	return line_level;
> +}
> +
> +/* Clear the physical active state */
> +void vgic_irq_clear_phys_active(struct vgic_irq *irq)
> +{
> +
> +	BUG_ON(!irq->hw);
> +	WARN_ON(irq_set_irqchip_state(irq->host_irq,
> +				      IRQCHIP_STATE_ACTIVE,
> +				      false));
> +}
> +
>  /**
>   * kvm_vgic_target_oracle - compute the target vcpu for an irq
>   *
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bba7fa2..22f106d 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
>  		return irq->pending_latch || irq->line_level;
>  }
>  
> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> +{
> +	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> +}
> +
>  /*
>   * This struct provides an intermediate representation of the fields contained
>   * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
> @@ -140,6 +145,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> +bool vgic_irq_line_level(struct vgic_irq *irq);
> +void vgic_irq_clear_phys_active(struct vgic_irq *irq);
>  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
>  void vgic_kick_vcpus(struct kvm *kvm);

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
Marc Zyngier Sept. 2, 2017, 11:04 a.m. UTC | #7
On Tue, Aug 29 2017 at 11:39:00 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Level-triggered mapped IRQs are special because we only observe rising
> edges as input to the VGIC, and we don't set the EOI flag and therefore
> are not told when the level goes down, so that we can re-queue a new
> interrupt when the level goes up.
>
> One way to solve this problem is to side-step the logic of the VGIC and
> special case the validation in the injection path, but it has the
> unfortunate drawback of having to peak into the physical GIC state
> whenever we want to know if the interrupt is pending on the virtual
> distributor.
>
> Instead, we can maintain the current semantics of a level triggered
> interrupt by sort of treating it as an edge-triggered interrupt,
> following from the fact that we only observe an asserting edge.  This
> requires us to be a bit careful when populating the LRs and when folding
> the state back in though:
>
>  * We lower the line level when populating the LR, so that when
>    subsequently observing an asserting edge, the VGIC will do the right
>    thing.
>
>  * If the guest never acked the interrupt while running (for example if
>    it had masked interrupts at the CPU level while running), we have
>    to preserve the pending state of the LR and move it back to the
>    line_level field of the struct irq when folding LR state.
>
>    If the guest never acked the interrupt while running, but changed the
>    device state and lowered the line (again with interrupts masked) then
>    we need to observe this change in the line_level.
>
>    Both of the above situations are solved by sampling the physical line
>    and set the line level when folding the LR back.
>
>  * Finally, if the guest never acked the interrupt while running and
>    sampling the line reveals that the device state has changed and the
>    line has been lowered, we must clear the physical active state, since
>    we will otherwise never be told when the interrupt becomes asserted
>    again.
>
> This has the added benefit of making the timer optimization patches
> (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> bit simpler, because the timer code doesn't have to clear the active
> state on the sync anymore.  It also potentially improves the performance
> of the timer implementation because the GIC knows the state or the LR
> and only needs to clear the
> active state when the pending bit in the LR is still set, where the
> timer has to always clear it when returning from running the guest with
> an injected timer interrupt.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
>  4 files changed, 88 insertions(+)
>
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e4187e5..f7c5cb5 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> +		/*
> +		 * Level-triggered mapped IRQs are special because we only
> +		 * observe rising edges as input to the VGIC.
> +		 *
> +		 * If the guest never acked the interrupt we have to sample
> +		 * the physical line and set the line level, because the
> +		 * device state could have changed or we simply need to
> +		 * process the still pending interrupt later.
> +		 *
> +		 * If this causes us to lower the level, we have to also clear
> +		 * the physical active state, since we will otherwise never be
> +		 * told when the interrupt becomes asserted again.
> +		 */
> +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {

I've just had a worrying though. Here, we're looking at the guest's view
of the trigger. But shouldn't that be the *host's*? We're assuming that
the two should match, and while they certainly do for the timer, this is
not something that can be enforced for SPIs.

What do you think?

	M.
Christoffer Dall Sept. 2, 2017, 8:23 p.m. UTC | #8
On Sat, Sep 02, 2017 at 12:04:06PM +0100, Marc Zyngier wrote:
> On Tue, Aug 29 2017 at 11:39:00 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > Level-triggered mapped IRQs are special because we only observe rising
> > edges as input to the VGIC, and we don't set the EOI flag and therefore
> > are not told when the level goes down, so that we can re-queue a new
> > interrupt when the level goes up.
> >
> > One way to solve this problem is to side-step the logic of the VGIC and
> > special case the validation in the injection path, but it has the
> > unfortunate drawback of having to peak into the physical GIC state
> > whenever we want to know if the interrupt is pending on the virtual
> > distributor.
> >
> > Instead, we can maintain the current semantics of a level triggered
> > interrupt by sort of treating it as an edge-triggered interrupt,
> > following from the fact that we only observe an asserting edge.  This
> > requires us to be a bit careful when populating the LRs and when folding
> > the state back in though:
> >
> >  * We lower the line level when populating the LR, so that when
> >    subsequently observing an asserting edge, the VGIC will do the right
> >    thing.
> >
> >  * If the guest never acked the interrupt while running (for example if
> >    it had masked interrupts at the CPU level while running), we have
> >    to preserve the pending state of the LR and move it back to the
> >    line_level field of the struct irq when folding LR state.
> >
> >    If the guest never acked the interrupt while running, but changed the
> >    device state and lowered the line (again with interrupts masked) then
> >    we need to observe this change in the line_level.
> >
> >    Both of the above situations are solved by sampling the physical line
> >    and set the line level when folding the LR back.
> >
> >  * Finally, if the guest never acked the interrupt while running and
> >    sampling the line reveals that the device state has changed and the
> >    line has been lowered, we must clear the physical active state, since
> >    we will otherwise never be told when the interrupt becomes asserted
> >    again.
> >
> > This has the added benefit of making the timer optimization patches
> > (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> > bit simpler, because the timer code doesn't have to clear the active
> > state on the sync anymore.  It also potentially improves the performance
> > of the timer implementation because the GIC knows the state or the LR
> > and only needs to clear the
> > active state when the pending bit in the LR is still set, where the
> > timer has to always clear it when returning from running the guest with
> > an injected timer interrupt.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
> >  4 files changed, 88 insertions(+)
> >
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> > index e4187e5..f7c5cb5 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > +		/*
> > +		 * Level-triggered mapped IRQs are special because we only
> > +		 * observe rising edges as input to the VGIC.
> > +		 *
> > +		 * If the guest never acked the interrupt we have to sample
> > +		 * the physical line and set the line level, because the
> > +		 * device state could have changed or we simply need to
> > +		 * process the still pending interrupt later.
> > +		 *
> > +		 * If this causes us to lower the level, we have to also clear
> > +		 * the physical active state, since we will otherwise never be
> > +		 * told when the interrupt becomes asserted again.
> > +		 */
> > +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> 
> I've just had a worrying though. Here, we're looking at the guest's view
> of the trigger. But shouldn't that be the *host's*? We're assuming that
> the two should match, and while they certainly do for the timer, this is
> not something that can be enforced for SPIs.
> 
> What do you think?
> 
This is a tricky one.  I think that in general, if the guest
misconfigures the level/edge of a virtual interrupt, it is only setting
itself up for trouble, which doesn't interfere with the host
functionality, so we don't have to worry about that.  What we do have to
worry is about is that the guest can't interfere with the host in
uninteded ways by playing with the interrupts.  Let's consider the two
scenarios:

Phys = level, Virt = Edge
-------------------------
These patches don't change anything from today's behavior.  We may loose
interrupts for the VM, tough.

Phys = edge, Virt = Level
-------------------------
In this case we will read the physical input level on return from the
VM.  That changes the virtual line_level, no harm done.  If we see the
input line to the VGIC (or physial pending state) is asserted, we will
deactivate a physical interrup which means, if it's actually pending,
the driver (VFIO or arch timer, typically) will take another
interrupt, no harm done.

So I think we're fine.


Thanks,
-Christoffer
Christoffer Dall Sept. 2, 2017, 8:37 p.m. UTC | #9
On Sat, Sep 02, 2017 at 11:52:57AM +0100, Marc Zyngier wrote:
> On Tue, Aug 29 2017 at 11:39:00 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > Level-triggered mapped IRQs are special because we only observe rising
> > edges as input to the VGIC, and we don't set the EOI flag and therefore
> > are not told when the level goes down, so that we can re-queue a new
> > interrupt when the level goes up.
> >
> > One way to solve this problem is to side-step the logic of the VGIC and
> > special case the validation in the injection path, but it has the
> > unfortunate drawback of having to peak into the physical GIC state
> > whenever we want to know if the interrupt is pending on the virtual
> > distributor.
> >
> > Instead, we can maintain the current semantics of a level triggered
> > interrupt by sort of treating it as an edge-triggered interrupt,
> > following from the fact that we only observe an asserting edge.  This
> 
> Would it be worth mentioning that this is a consequence of offloading
> part of the flow handling to the HW?
> 

Possibly, but I'm not really sure what that actually means.  Can you
explain a little more, and I'll be happy to add the text?

> > requires us to be a bit careful when populating the LRs and when folding
> > the state back in though:
> >
> >  * We lower the line level when populating the LR, so that when
> >    subsequently observing an asserting edge, the VGIC will do the right
> >    thing.
> >
> >  * If the guest never acked the interrupt while running (for example if
> >    it had masked interrupts at the CPU level while running), we have
> >    to preserve the pending state of the LR and move it back to the
> >    line_level field of the struct irq when folding LR state.
> >
> >    If the guest never acked the interrupt while running, but changed the
> >    device state and lowered the line (again with interrupts masked) then
> >    we need to observe this change in the line_level.
> >
> >    Both of the above situations are solved by sampling the physical line
> >    and set the line level when folding the LR back.
> >
> >  * Finally, if the guest never acked the interrupt while running and
> >    sampling the line reveals that the device state has changed and the
> >    line has been lowered, we must clear the physical active state, since
> >    we will otherwise never be told when the interrupt becomes asserted
> >    again.
> >
> > This has the added benefit of making the timer optimization patches
> > (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
> > bit simpler, because the timer code doesn't have to clear the active
> > state on the sync anymore.  It also potentially improves the performance
> > of the timer implementation because the GIC knows the state or the LR
> > and only needs to clear the
> > active state when the pending bit in the LR is still set, where the
> > timer has to always clear it when returning from running the guest with
> > an injected timer interrupt.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic/vgic-v2.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic-v3.c | 29 +++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.c    | 23 +++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.h    |  7 +++++++
> >  4 files changed, 88 insertions(+)
> >
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> > index e4187e5..f7c5cb5 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -104,6 +104,26 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > +		/*
> > +		 * Level-triggered mapped IRQs are special because we only
> > +		 * observe rising edges as input to the VGIC.
> > +		 *
> > +		 * If the guest never acked the interrupt we have to sample
> > +		 * the physical line and set the line level, because the
> > +		 * device state could have changed or we simply need to
> > +		 * process the still pending interrupt later.
> > +		 *
> > +		 * If this causes us to lower the level, we have to also clear
> > +		 * the physical active state, since we will otherwise never be
> > +		 * told when the interrupt becomes asserted again.
> > +		 */
> > +		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> > +			irq->line_level = vgic_irq_line_level(irq);
> > +
> > +			if (!irq->line_level)
> > +				vgic_irq_clear_phys_active(irq);
> > +		}
> > +
> >  		spin_unlock(&irq->irq_lock);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -161,6 +181,15 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
> >  			val |= GICH_LR_EOI;
> >  	}
> >  
> > +	/*
> > +	 * Level-triggered mapped IRQs are special because we only observe
> > +	 * rising edges as input to the VGIC.  We therefore lower the line
> > +	 * level here, so that we can take new virtual IRQs.  See
> > +	 * vgic_v2_fold_lr_state for more info.
> > +	 */
> > +	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
> > +		irq->line_level = false;
> > +
> >  	/* The GICv2 LR only holds five bits of priority. */
> >  	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> > index 96ea597..e377036 100644
> > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > @@ -94,6 +94,26 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > +		/*
> > +		 * Level-triggered mapped IRQs are special because we only
> > +		 * observe rising edges as input to the VGIC.
> > +		 *
> > +		 * If the guest never acked the interrupt we have to sample
> > +		 * the physical line and set the line level, because the
> > +		 * device state could have changed or we simply need to
> > +		 * process the still pending interrupt later.
> > +		 *
> > +		 * If this causes us to lower the level, we have to also clear
> > +		 * the physical active state, since we will otherwise never be
> > +		 * told when the interrupt becomes asserted again.
> > +		 */
> > +		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> > +			irq->line_level = vgic_irq_line_level(irq);
> > +
> > +			if (!irq->line_level)
> > +				vgic_irq_clear_phys_active(irq);
> > +		}
> > +
> >  		spin_unlock(&irq->irq_lock);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -144,6 +164,15 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
> >  	}
> >  
> >  	/*
> > +	 * Level-triggered mapped IRQs are special because we only observe
> > +	 * rising edges as input to the VGIC.  We therefore lower the line
> > +	 * level here, so that we can take new virtual IRQs.  See
> > +	 * vgic_v3_fold_lr_state for more info.
> > +	 */
> > +	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
> > +		irq->line_level = false;
> > +
> > +	/*
> >  	 * We currently only support Group1 interrupts, which is a
> >  	 * known defect. This needs to be addressed at some point.
> >  	 */
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 9d557efd..2691a0a 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -140,6 +140,29 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >  	kfree(irq);
> >  }
> >  
> > +/* Get the input level of a mapped IRQ directly from the physical GIC */
> > +bool vgic_irq_line_level(struct vgic_irq *irq)
> 
> nit: it would make the above clearer if this was named something like
> vgic_irq_get_phys_line_level(), as the current name is pretty ambiguous
> about which side of the GIC we're looking at.
> 
Yes, absolutely.

> > +{
> > +	bool line_level;
> > +
> > +	BUG_ON(!irq->hw);
> > +
> > +	WARN_ON(irq_get_irqchip_state(irq->host_irq,
> > +				      IRQCHIP_STATE_PENDING,
> > +				      &line_level));
> > +	return line_level;
> > +}
> > +
> > +/* Clear the physical active state */
> > +void vgic_irq_clear_phys_active(struct vgic_irq *irq)
> > +{
> > +
> > +	BUG_ON(!irq->hw);
> > +	WARN_ON(irq_set_irqchip_state(irq->host_irq,
> > +				      IRQCHIP_STATE_ACTIVE,
> > +				      false));
> > +}
> > +
> >  /**
> >   * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >   *
> > diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> > index bba7fa2..22f106d 100644
> > --- a/virt/kvm/arm/vgic/vgic.h
> > +++ b/virt/kvm/arm/vgic/vgic.h
> > @@ -104,6 +104,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
> >  		return irq->pending_latch || irq->line_level;
> >  }
> >  
> > +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> > +{
> > +	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> > +}
> > +
> >  /*
> >   * This struct provides an intermediate representation of the fields contained
> >   * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
> > @@ -140,6 +145,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
> >  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> >  			      u32 intid);
> >  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> > +bool vgic_irq_line_level(struct vgic_irq *irq);
> > +void vgic_irq_clear_phys_active(struct vgic_irq *irq);
> >  bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> >  void vgic_kick_vcpus(struct kvm *kvm);
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> 

Thanks!
-Christoffer
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e4187e5..f7c5cb5 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -104,6 +104,26 @@  void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
+		/*
+		 * Level-triggered mapped IRQs are special because we only
+		 * observe rising edges as input to the VGIC.
+		 *
+		 * If the guest never acked the interrupt we have to sample
+		 * the physical line and set the line level, because the
+		 * device state could have changed or we simply need to
+		 * process the still pending interrupt later.
+		 *
+		 * If this causes us to lower the level, we have to also clear
+		 * the physical active state, since we will otherwise never be
+		 * told when the interrupt becomes asserted again.
+		 */
+		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
+			irq->line_level = vgic_irq_line_level(irq);
+
+			if (!irq->line_level)
+				vgic_irq_clear_phys_active(irq);
+		}
+
 		spin_unlock(&irq->irq_lock);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -161,6 +181,15 @@  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
 			val |= GICH_LR_EOI;
 	}
 
+	/*
+	 * Level-triggered mapped IRQs are special because we only observe
+	 * rising edges as input to the VGIC.  We therefore lower the line
+	 * level here, so that we can take new virtual IRQs.  See
+	 * vgic_v2_fold_lr_state for more info.
+	 */
+	if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT))
+		irq->line_level = false;
+
 	/* The GICv2 LR only holds five bits of priority. */
 	val |= (irq->priority >> 3) << GICH_LR_PRIORITY_SHIFT;
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 96ea597..e377036 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -94,6 +94,26 @@  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
+		/*
+		 * Level-triggered mapped IRQs are special because we only
+		 * observe rising edges as input to the VGIC.
+		 *
+		 * If the guest never acked the interrupt we have to sample
+		 * the physical line and set the line level, because the
+		 * device state could have changed or we simply need to
+		 * process the still pending interrupt later.
+		 *
+		 * If this causes us to lower the level, we have to also clear
+		 * the physical active state, since we will otherwise never be
+		 * told when the interrupt becomes asserted again.
+		 */
+		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
+			irq->line_level = vgic_irq_line_level(irq);
+
+			if (!irq->line_level)
+				vgic_irq_clear_phys_active(irq);
+		}
+
 		spin_unlock(&irq->irq_lock);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -144,6 +164,15 @@  void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
 	}
 
 	/*
+	 * Level-triggered mapped IRQs are special because we only observe
+	 * rising edges as input to the VGIC.  We therefore lower the line
+	 * level here, so that we can take new virtual IRQs.  See
+	 * vgic_v3_fold_lr_state for more info.
+	 */
+	if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT))
+		irq->line_level = false;
+
+	/*
 	 * We currently only support Group1 interrupts, which is a
 	 * known defect. This needs to be addressed at some point.
 	 */
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 9d557efd..2691a0a 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -140,6 +140,29 @@  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 	kfree(irq);
 }
 
+/* Get the input level of a mapped IRQ directly from the physical GIC */
+bool vgic_irq_line_level(struct vgic_irq *irq)
+{
+	bool line_level;
+
+	BUG_ON(!irq->hw);
+
+	WARN_ON(irq_get_irqchip_state(irq->host_irq,
+				      IRQCHIP_STATE_PENDING,
+				      &line_level));
+	return line_level;
+}
+
+/* Clear the physical active state */
+void vgic_irq_clear_phys_active(struct vgic_irq *irq)
+{
+
+	BUG_ON(!irq->hw);
+	WARN_ON(irq_set_irqchip_state(irq->host_irq,
+				      IRQCHIP_STATE_ACTIVE,
+				      false));
+}
+
 /**
  * kvm_vgic_target_oracle - compute the target vcpu for an irq
  *
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index bba7fa2..22f106d 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -104,6 +104,11 @@  static inline bool irq_is_pending(struct vgic_irq *irq)
 		return irq->pending_latch || irq->line_level;
 }
 
+static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
+{
+	return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
+}
+
 /*
  * This struct provides an intermediate representation of the fields contained
  * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
@@ -140,6 +145,8 @@  vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid);
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
+bool vgic_irq_line_level(struct vgic_irq *irq);
+void vgic_irq_clear_phys_active(struct vgic_irq *irq);
 bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
 void vgic_kick_vcpus(struct kvm *kvm);