KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual interrupt is pending
diff mbox series

Message ID 20190208144300.75299-1-marc.zyngier@arm.com
State New
Headers show
Series
  • KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual interrupt is pending
Related show

Commit Message

Marc Zyngier Feb. 8, 2019, 2:43 p.m. UTC
When a guest gets scheduled, KVM performs a "load" operation,
which for the timer includes evaluating the virtual "active" state
of the interrupt, and replicating it on the physical side. This
ensures that the deactivation in the guest will also take place
in the physical GIC distributor.

If the interrupt is not yet active, we flag it as inactive on the
physical side.  This means that on restoring the timer registers,
if the timer has expired, we'll immediately take an interrupt.
That's absolutely fine, as the interrupt will then be flagged as
active on the physical side. What this assumes though is that we'll
enter the guest right after having taken the interrupt, and that
the guest will quickly ACK the interrupt, making it active at on
the virtual side.

It turns out that quite often, this assumption doesn't really hold.
The guest may be preempted on the back on this interrupt, either
from kernel space or whilst running at EL1 when a host interrupt
fires. When this happens, we repeat the whole sequence on the
next load (interrupt marked as inactive, timer registers restored,
interrupt fires). And if it takes a really long time for a guest
to activate the interrupt (as it does with nested virt), we end-up
with many such events in quick succession, leading to the guest only
making very slow progress.

This can also be seen with the number of virtual timer interrupt on the
host being far greater than the same number in the guest.

An easy way to fix this is to evaluate the timer state when performing
the "load" operation, just like we do when the interrupt actually fires.
If the timer has a pending virtual interrupt at this stage, then we
can safely flag the physical interrupt as being active, which prevents
spurious exits.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/arch_timer.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Julien Thierry Feb. 12, 2019, 4:17 p.m. UTC | #1
Hi Marc,

On 08/02/2019 14:43, Marc Zyngier wrote:
> When a guest gets scheduled, KVM performs a "load" operation,
> which for the timer includes evaluating the virtual "active" state
> of the interrupt, and replicating it on the physical side. This
> ensures that the deactivation in the guest will also take place
> in the physical GIC distributor.
> 
> If the interrupt is not yet active, we flag it as inactive on the
> physical side.  This means that on restoring the timer registers,
> if the timer has expired, we'll immediately take an interrupt.
> That's absolutely fine, as the interrupt will then be flagged as
> active on the physical side. What this assumes though is that we'll
> enter the guest right after having taken the interrupt, and that
> the guest will quickly ACK the interrupt, making it active at on

Nit: "at on" -> pick one

> the virtual side.
> 
> It turns out that quite often, this assumption doesn't really hold.
> The guest may be preempted on the back on this interrupt, either

on the back of*

> from kernel space or whilst running at EL1 when a host interrupt
> fires. When this happens, we repeat the whole sequence on the
> next load (interrupt marked as inactive, timer registers restored,
> interrupt fires). And if it takes a really long time for a guest
> to activate the interrupt (as it does with nested virt), we end-up
> with many such events in quick succession, leading to the guest only
> making very slow progress.
> 
> This can also be seen with the number of virtual timer interrupt on the
> host being far greater than the same number in the guest.
> 
> An easy way to fix this is to evaluate the timer state when performing
> the "load" operation, just like we do when the interrupt actually fires.
> If the timer has a pending virtual interrupt at this stage, then we
> can safely flag the physical interrupt as being active, which prevents
> spurious exits.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Otherwise, I think the change makes sense:

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

Cheers,

> ---
>  virt/kvm/arm/arch_timer.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7449651ae2e5..70c18479ccd5 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
>  static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
>  {
>  	struct kvm_vcpu *vcpu = ctx->vcpu;
> -	bool phys_active;
> +	bool phys_active = false;
> +
> +	/*
> +	 * Update the timer output so that it is likely to match the
> +	 * state we're about to restore. If the timer expires between
> +	 * this point and the register restoration, we'll take the
> +	 * interrupt anyway.
> +	 */
> +	kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx);
>  
>  	if (irqchip_in_kernel(vcpu->kvm))
>  		phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq);
> -	else
> -		phys_active = ctx->irq.level;
> +
> +	phys_active |= ctx->irq.level;
> +
>  	set_timer_irq_phys_active(ctx, phys_active);
>  }
>  
>
Christoffer Dall Feb. 12, 2019, 4:41 p.m. UTC | #2
On Fri, Feb 08, 2019 at 02:43:00PM +0000, Marc Zyngier wrote:
> When a guest gets scheduled, KVM performs a "load" operation,
> which for the timer includes evaluating the virtual "active" state
> of the interrupt, and replicating it on the physical side. This
> ensures that the deactivation in the guest will also take place
> in the physical GIC distributor.
> 
> If the interrupt is not yet active, we flag it as inactive on the
> physical side.  This means that on restoring the timer registers,
> if the timer has expired, we'll immediately take an interrupt.
> That's absolutely fine, as the interrupt will then be flagged as
> active on the physical side. What this assumes though is that we'll
> enter the guest right after having taken the interrupt, and that
> the guest will quickly ACK the interrupt, making it active at on
> the virtual side.
> 
> It turns out that quite often, this assumption doesn't really hold.
> The guest may be preempted on the back on this interrupt, either
> from kernel space or whilst running at EL1 when a host interrupt
> fires. When this happens, we repeat the whole sequence on the
> next load (interrupt marked as inactive, timer registers restored,
> interrupt fires). And if it takes a really long time for a guest
> to activate the interrupt (as it does with nested virt), we end-up
> with many such events in quick succession, leading to the guest only
> making very slow progress.
> 
> This can also be seen with the number of virtual timer interrupt on the
> host being far greater than the same number in the guest.
> 
> An easy way to fix this is to evaluate the timer state when performing
> the "load" operation, just like we do when the interrupt actually fires.
> If the timer has a pending virtual interrupt at this stage, then we
> can safely flag the physical interrupt as being active, which prevents
> spurious exits.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/arch_timer.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7449651ae2e5..70c18479ccd5 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
>  static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
>  {
>  	struct kvm_vcpu *vcpu = ctx->vcpu;
> -	bool phys_active;
> +	bool phys_active = false;
> +
> +	/*
> +	 * Update the timer output so that it is likely to match the
> +	 * state we're about to restore. If the timer expires between
> +	 * this point and the register restoration, we'll take the
> +	 * interrupt anyway.
> +	 */
> +	kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx);
>  
>  	if (irqchip_in_kernel(vcpu->kvm))
>  		phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq);
> -	else
> -		phys_active = ctx->irq.level;
> +
> +	phys_active |= ctx->irq.level;
> +
>  	set_timer_irq_phys_active(ctx, phys_active);
>  }
>  
> -- 
> 2.20.1
> 
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>

Patch
diff mbox series

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7449651ae2e5..70c18479ccd5 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -487,12 +487,21 @@  static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo
 static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
 {
 	struct kvm_vcpu *vcpu = ctx->vcpu;
-	bool phys_active;
+	bool phys_active = false;
+
+	/*
+	 * Update the timer output so that it is likely to match the
+	 * state we're about to restore. If the timer expires between
+	 * this point and the register restoration, we'll take the
+	 * interrupt anyway.
+	 */
+	kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx);
 
 	if (irqchip_in_kernel(vcpu->kvm))
 		phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq);
-	else
-		phys_active = ctx->irq.level;
+
+	phys_active |= ctx->irq.level;
+
 	set_timer_irq_phys_active(ctx, phys_active);
 }