diff mbox series

[v4,6/9] KVM: arm64: vgic: Implement SW-driven deactivation

Message ID 20210601104005.81332-7-maz@kernel.org (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Initial host support for the Apple M1 | expand

Commit Message

Marc Zyngier June 1, 2021, 10:40 a.m. UTC
In order to deal with these systems that do not offer HW-based
deactivation of interrupts, let implement a SW-based approach:

- When the irq is queued into a LR, treat it as a pure virtual
  interrupt and set the EOI flag in the LR.

- When the interrupt state is read back from the LR, force a
  deactivation when the state is invalid (neither active nor
  pending)

Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v2.c | 19 +++++++++++++++----
 arch/arm64/kvm/vgic/vgic-v3.c | 19 +++++++++++++++----
 include/kvm/arm_vgic.h        | 10 ++++++++++
 3 files changed, 40 insertions(+), 8 deletions(-)

Comments

Alexandru Elisei June 17, 2021, 2:58 p.m. UTC | #1
Hi Marc,

On 6/1/21 11:40 AM, Marc Zyngier wrote:
> In order to deal with these systems that do not offer HW-based
> deactivation of interrupts, let implement a SW-based approach:

Nitpick, but shouldn't that be "let's"?

>
> - When the irq is queued into a LR, treat it as a pure virtual
>   interrupt and set the EOI flag in the LR.
>
> - When the interrupt state is read back from the LR, force a
>   deactivation when the state is invalid (neither active nor
>   pending)
>
> Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/vgic/vgic-v2.c | 19 +++++++++++++++----
>  arch/arm64/kvm/vgic/vgic-v3.c | 19 +++++++++++++++----
>  include/kvm/arm_vgic.h        | 10 ++++++++++
>  3 files changed, 40 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
> index 11934c2af2f4..2c580204f1dc 100644
> --- a/arch/arm64/kvm/vgic/vgic-v2.c
> +++ b/arch/arm64/kvm/vgic/vgic-v2.c
> @@ -108,11 +108,22 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  		 * If this causes us to lower the level, we have to also clear
>  		 * the physical active state, since we will otherwise never be
>  		 * told when the interrupt becomes asserted again.
> +		 *
> +		 * Another case is when the interrupt requires a helping hand
> +		 * on deactivation (no HW deactivation, for example).
>  		 */
> -		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> -			irq->line_level = vgic_get_phys_line_level(irq);
> +		if (vgic_irq_is_mapped_level(irq)) {
> +			bool resample = false;
> +
> +			if (val & GICH_LR_PENDING_BIT) {
> +				irq->line_level = vgic_get_phys_line_level(irq);
> +				resample = !irq->line_level;
> +			} else if (vgic_irq_needs_resampling(irq) &&
> +				   !(irq->active || irq->pending_latch)) {

I'm having a hard time figuring out when and why a level sensitive can have
pending_latch = true.

I looked kvm_vgic_inject_irq(), and that function sets pending_latch only for edge
triggered interrupts (it sets line_level for level sensitive ones). But
irq_is_pending() looks at **both** pending_latch and line_level for level
sensitive interrupts.

The only place that I've found that sets pending_latch regardless of the interrupt
type is in vgic_mmio_write_spending() (called on a trapped write to
GICD_ISENABLER). vgic_v2_populate_lr() clears pending_latch only for edge
triggered interrupts, so that leaves vgic_v2_fold_lr_state() as the only function
pending_latch is cleared for level sensitive interrupts, when the interrupt has
been handled by the guest. Are we doing all of this to emulate the fact that level
sensitive interrupts (either purely virtual or hw mapped) made pending by a write
to GICD_ISENABLER remain pending until they are handled by the guest?

If that is the case, then I think this is what the code is doing:

- There's no functional change when the irqchip has HW deactivation

- For level sensitive, hw mapped interrupts made pending by a write to
GICD_ISENABLER and not yet handled by the guest (pending_latch == true) we don't
clear the pending state of the interrupt.

- For level sensitive, hw mapped interrupts we clear the pending state in the GIC
and the device will assert the interrupt again if it's still pending at the device
level. I have a question about this. Why don't we sample the interrupt state by
calling vgic_get_phys_line_level()? Because that would be slower than the
alternative that you are proposing here?

> +				resample = true;
> +			}
>  
> -			if (!irq->line_level)
> +			if (resample)
>  				vgic_irq_set_phys_active(irq, false);
>  		}
>  
> @@ -152,7 +163,7 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  	if (irq->group)
>  		val |= GICH_LR_GROUP1;
>  
> -	if (irq->hw) {
> +	if (irq->hw && !vgic_irq_needs_resampling(irq)) {

This looks good, we set the EOI bit in the LR register in the case of purely
virtual level sensitive interrupts or for HW mapped level sensitive on systems
where the GIC doesn't have the mandatory HW deactivation architectural feature.

>  		val |= GICH_LR_HW;
>  		val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
>  		/*
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 41ecf219c333..66004f61cd83 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -101,11 +101,22 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		 * If this causes us to lower the level, we have to also clear
>  		 * the physical active state, since we will otherwise never be
>  		 * told when the interrupt becomes asserted again.
> +		 *
> +		 * Another case is when the interrupt requires a helping hand
> +		 * on deactivation (no HW deactivation, for example).
>  		 */
> -		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
> -			irq->line_level = vgic_get_phys_line_level(irq);
> +		if (vgic_irq_is_mapped_level(irq)) {
> +			bool resample = false;
> +
> +			if (val & ICH_LR_PENDING_BIT) {
> +				irq->line_level = vgic_get_phys_line_level(irq);
> +				resample = !irq->line_level;
> +			} else if (vgic_irq_needs_resampling(irq) &&
> +				   !(irq->active || irq->pending_latch)) {
> +				resample = true;
> +			}
>  
> -			if (!irq->line_level)
> +			if (resample)
>  				vgic_irq_set_phys_active(irq, false);
>  		}
>  
> @@ -136,7 +147,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
>  		}
>  	}
>  
> -	if (irq->hw) {
> +	if (irq->hw && !vgic_irq_needs_resampling(irq)) {

Both changes to the vGICv3 code look identical to the vGICv2 changes.

Thanks,

Alex

>  		val |= ICH_LR_HW;
>  		val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
>  		/*
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index e5f06df000f2..e602d848fc1a 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -99,6 +99,11 @@ enum vgic_irq_config {
>   * kvm_arm_get_running_vcpu() to get the vcpu pointer for private IRQs.
>   */
>  struct irq_ops {
> +	/* Per interrupt flags for special-cased interrupts */
> +	unsigned long flags;
> +
> +#define VGIC_IRQ_SW_RESAMPLE	BIT(0)	/* Clear the active state for resampling */
> +
>  	/*
>  	 * Callback function pointer to in-kernel devices that can tell us the
>  	 * state of the input level of mapped level-triggered IRQ faster than
> @@ -150,6 +155,11 @@ struct vgic_irq {
>  					   for in-kernel devices. */
>  };
>  
> +static inline bool vgic_irq_needs_resampling(struct vgic_irq *irq)
> +{
> +	return irq->ops && (irq->ops->flags & VGIC_IRQ_SW_RESAMPLE);
> +}
> +
>  struct vgic_register_region;
>  struct vgic_its;
>
Marc Zyngier June 22, 2021, 4:12 p.m. UTC | #2
On Thu, 17 Jun 2021 15:58:31 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On 6/1/21 11:40 AM, Marc Zyngier wrote:
> > In order to deal with these systems that do not offer HW-based
> > deactivation of interrupts, let implement a SW-based approach:
> 
> Nitpick, but shouldn't that be "let's"?

"Let it be...". ;-) Yup.

> 
> >
> > - When the irq is queued into a LR, treat it as a pure virtual
> >   interrupt and set the EOI flag in the LR.
> >
> > - When the interrupt state is read back from the LR, force a
> >   deactivation when the state is invalid (neither active nor
> >   pending)
> >
> > Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/vgic/vgic-v2.c | 19 +++++++++++++++----
> >  arch/arm64/kvm/vgic/vgic-v3.c | 19 +++++++++++++++----
> >  include/kvm/arm_vgic.h        | 10 ++++++++++
> >  3 files changed, 40 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
> > index 11934c2af2f4..2c580204f1dc 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v2.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v2.c
> > @@ -108,11 +108,22 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  		 * If this causes us to lower the level, we have to also clear
> >  		 * the physical active state, since we will otherwise never be
> >  		 * told when the interrupt becomes asserted again.
> > +		 *
> > +		 * Another case is when the interrupt requires a helping hand
> > +		 * on deactivation (no HW deactivation, for example).
> >  		 */
> > -		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
> > -			irq->line_level = vgic_get_phys_line_level(irq);
> > +		if (vgic_irq_is_mapped_level(irq)) {
> > +			bool resample = false;
> > +
> > +			if (val & GICH_LR_PENDING_BIT) {
> > +				irq->line_level = vgic_get_phys_line_level(irq);
> > +				resample = !irq->line_level;
> > +			} else if (vgic_irq_needs_resampling(irq) &&
> > +				   !(irq->active || irq->pending_latch)) {
> 
> I'm having a hard time figuring out when and why a level sensitive
> can have pending_latch = true.
> 
> I looked kvm_vgic_inject_irq(), and that function sets pending_latch
> only for edge triggered interrupts (it sets line_level for level
> sensitive ones). But irq_is_pending() looks at **both**
> pending_latch and line_level for level sensitive interrupts.

Yes, and that's what an implementation requires.

> The only place that I've found that sets pending_latch regardless of
> the  interrupt type  is in  vgic_mmio_write_spending() (called  on a
> trapped  write  to   GICD_ISENABLER).

Are you sure? It really should be GICD_ISPENDR. I'll assume that this
is what you mean below.

> vgic_v2_populate_lr()  clears
> pending_latch  only for  edge triggered  interrupts, so  that leaves
> vgic_v2_fold_lr_state()  as  the   only  function  pending_latch  is
> cleared for level sensitive interrupts,  when the interrupt has been
> handled by the guest.  Are we doing all of this  to emulate the fact
> that level sensitive interrupts (either purely virtual or hw mapped)
> made pending by a write  to GICD_ISENABLER remain pending until they
> are handled by the guest?

Yes, or cleared by a write to GICD_ICPENDR. You really need to think
of the input into the GIC as some sort of OR gate combining both the
line level and the PEND register. With a latch for edge interrupts.

Have a look at Figure 4-10 ("Logic of the pending status of a
level-sensitive interrupt") in the GICv2 arch spec (ARM IHI 0048B.b)
to see what I actually mean.

> If that is the case, then I think this is what the code is doing:
> 
> - There's no functional change when the irqchip has HW deactivation
> 
> - For level sensitive, hw mapped interrupts made pending by a write
> to GICD_ISENABLER and not yet handled by the guest (pending_latch ==
> true) we don't clear the pending state of the interrupt.
> 
> - For level sensitive, hw mapped interrupts we clear the pending
> state in the GIC and the device will assert the interrupt again if
> it's still pending at the device 1level. I have a question about
> this. Why don't we sample the interrupt state by calling
> vgic_get_phys_line_level()? Because that would be slower than the
> alternative that you are proposing here?

Yes. It is *much* faster to read the timer status register (for
example) than going via an MMIO access to read the (re)distributor
that will return the same value.

Thanks,

	M.
Alexandru Elisei June 23, 2021, 2:15 p.m. UTC | #3
Hi Marc,

On 6/22/21 5:12 PM, Marc Zyngier wrote:
> On Thu, 17 Jun 2021 15:58:31 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> Hi Marc,
>>
>> On 6/1/21 11:40 AM, Marc Zyngier wrote:
>>> In order to deal with these systems that do not offer HW-based
>>> deactivation of interrupts, let implement a SW-based approach:
>> Nitpick, but shouldn't that be "let's"?
> "Let it be...". ;-) Yup.
>
>>> - When the irq is queued into a LR, treat it as a pure virtual
>>>   interrupt and set the EOI flag in the LR.
>>>
>>> - When the interrupt state is read back from the LR, force a
>>>   deactivation when the state is invalid (neither active nor
>>>   pending)
>>>
>>> Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
>>>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>> ---
>>>  arch/arm64/kvm/vgic/vgic-v2.c | 19 +++++++++++++++----
>>>  arch/arm64/kvm/vgic/vgic-v3.c | 19 +++++++++++++++----
>>>  include/kvm/arm_vgic.h        | 10 ++++++++++
>>>  3 files changed, 40 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
>>> index 11934c2af2f4..2c580204f1dc 100644
>>> --- a/arch/arm64/kvm/vgic/vgic-v2.c
>>> +++ b/arch/arm64/kvm/vgic/vgic-v2.c
>>> @@ -108,11 +108,22 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>>>  		 * If this causes us to lower the level, we have to also clear
>>>  		 * the physical active state, since we will otherwise never be
>>>  		 * told when the interrupt becomes asserted again.
>>> +		 *
>>> +		 * Another case is when the interrupt requires a helping hand
>>> +		 * on deactivation (no HW deactivation, for example).
>>>  		 */
>>> -		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
>>> -			irq->line_level = vgic_get_phys_line_level(irq);
>>> +		if (vgic_irq_is_mapped_level(irq)) {
>>> +			bool resample = false;
>>> +
>>> +			if (val & GICH_LR_PENDING_BIT) {
>>> +				irq->line_level = vgic_get_phys_line_level(irq);
>>> +				resample = !irq->line_level;
>>> +			} else if (vgic_irq_needs_resampling(irq) &&
>>> +				   !(irq->active || irq->pending_latch)) {
>> I'm having a hard time figuring out when and why a level sensitive
>> can have pending_latch = true.
>>
>> I looked kvm_vgic_inject_irq(), and that function sets pending_latch
>> only for edge triggered interrupts (it sets line_level for level
>> sensitive ones). But irq_is_pending() looks at **both**
>> pending_latch and line_level for level sensitive interrupts.
> Yes, and that's what an implementation requires.
>
>> The only place that I've found that sets pending_latch regardless of
>> the  interrupt type  is in  vgic_mmio_write_spending() (called  on a
>> trapped  write  to   GICD_ISENABLER).
> Are you sure? It really should be GICD_ISPENDR. I'll assume that this
> is what you mean below.

Yes, that's what I meant, sorry for the confusion.

>
>> vgic_v2_populate_lr()  clears
>> pending_latch  only for  edge triggered  interrupts, so  that leaves
>> vgic_v2_fold_lr_state()  as  the   only  function  pending_latch  is
>> cleared for level sensitive interrupts,  when the interrupt has been
>> handled by the guest.  Are we doing all of this  to emulate the fact
>> that level sensitive interrupts (either purely virtual or hw mapped)
>> made pending by a write  to GICD_ISENABLER remain pending until they
>> are handled by the guest?
> Yes, or cleared by a write to GICD_ICPENDR. You really need to think
> of the input into the GIC as some sort of OR gate combining both the
> line level and the PEND register. With a latch for edge interrupts.
>
> Have a look at Figure 4-10 ("Logic of the pending status of a
> level-sensitive interrupt") in the GICv2 arch spec (ARM IHI 0048B.b)
> to see what I actually mean.
>
>> If that is the case, then I think this is what the code is doing:
>>
>> - There's no functional change when the irqchip has HW deactivation
>>
>> - For level sensitive, hw mapped interrupts made pending by a write
>> to GICD_ISENABLER and not yet handled by the guest (pending_latch ==
>> true) we don't clear the pending state of the interrupt.
>>
>> - For level sensitive, hw mapped interrupts we clear the pending
>> state in the GIC and the device will assert the interrupt again if
>> it's still pending at the device 1level. I have a question about
>> this. Why don't we sample the interrupt state by calling
>> vgic_get_phys_line_level()? Because that would be slower than the
>> alternative that you are proposing here?
> Yes. It is *much* faster to read the timer status register (for
> example) than going via an MMIO access to read the (re)distributor
> that will return the same value.

Thank you for the explanation, much appreciated. The patch looks to me like it's
doing the right thing.

Thanks,

Alex
diff mbox series

Patch

diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c
index 11934c2af2f4..2c580204f1dc 100644
--- a/arch/arm64/kvm/vgic/vgic-v2.c
+++ b/arch/arm64/kvm/vgic/vgic-v2.c
@@ -108,11 +108,22 @@  void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 		 * If this causes us to lower the level, we have to also clear
 		 * the physical active state, since we will otherwise never be
 		 * told when the interrupt becomes asserted again.
+		 *
+		 * Another case is when the interrupt requires a helping hand
+		 * on deactivation (no HW deactivation, for example).
 		 */
-		if (vgic_irq_is_mapped_level(irq) && (val & GICH_LR_PENDING_BIT)) {
-			irq->line_level = vgic_get_phys_line_level(irq);
+		if (vgic_irq_is_mapped_level(irq)) {
+			bool resample = false;
+
+			if (val & GICH_LR_PENDING_BIT) {
+				irq->line_level = vgic_get_phys_line_level(irq);
+				resample = !irq->line_level;
+			} else if (vgic_irq_needs_resampling(irq) &&
+				   !(irq->active || irq->pending_latch)) {
+				resample = true;
+			}
 
-			if (!irq->line_level)
+			if (resample)
 				vgic_irq_set_phys_active(irq, false);
 		}
 
@@ -152,7 +163,7 @@  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
 	if (irq->group)
 		val |= GICH_LR_GROUP1;
 
-	if (irq->hw) {
+	if (irq->hw && !vgic_irq_needs_resampling(irq)) {
 		val |= GICH_LR_HW;
 		val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
 		/*
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 41ecf219c333..66004f61cd83 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -101,11 +101,22 @@  void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		 * If this causes us to lower the level, we have to also clear
 		 * the physical active state, since we will otherwise never be
 		 * told when the interrupt becomes asserted again.
+		 *
+		 * Another case is when the interrupt requires a helping hand
+		 * on deactivation (no HW deactivation, for example).
 		 */
-		if (vgic_irq_is_mapped_level(irq) && (val & ICH_LR_PENDING_BIT)) {
-			irq->line_level = vgic_get_phys_line_level(irq);
+		if (vgic_irq_is_mapped_level(irq)) {
+			bool resample = false;
+
+			if (val & ICH_LR_PENDING_BIT) {
+				irq->line_level = vgic_get_phys_line_level(irq);
+				resample = !irq->line_level;
+			} else if (vgic_irq_needs_resampling(irq) &&
+				   !(irq->active || irq->pending_latch)) {
+				resample = true;
+			}
 
-			if (!irq->line_level)
+			if (resample)
 				vgic_irq_set_phys_active(irq, false);
 		}
 
@@ -136,7 +147,7 @@  void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr)
 		}
 	}
 
-	if (irq->hw) {
+	if (irq->hw && !vgic_irq_needs_resampling(irq)) {
 		val |= ICH_LR_HW;
 		val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
 		/*
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index e5f06df000f2..e602d848fc1a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -99,6 +99,11 @@  enum vgic_irq_config {
  * kvm_arm_get_running_vcpu() to get the vcpu pointer for private IRQs.
  */
 struct irq_ops {
+	/* Per interrupt flags for special-cased interrupts */
+	unsigned long flags;
+
+#define VGIC_IRQ_SW_RESAMPLE	BIT(0)	/* Clear the active state for resampling */
+
 	/*
 	 * Callback function pointer to in-kernel devices that can tell us the
 	 * state of the input level of mapped level-triggered IRQ faster than
@@ -150,6 +155,11 @@  struct vgic_irq {
 					   for in-kernel devices. */
 };
 
+static inline bool vgic_irq_needs_resampling(struct vgic_irq *irq)
+{
+	return irq->ops && (irq->ops->flags & VGIC_IRQ_SW_RESAMPLE);
+}
+
 struct vgic_register_region;
 struct vgic_its;