Message ID | 20190208144300.75299-1-marc.zyngier@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual interrupt is pending | expand |
Hi Marc, On 08/02/2019 14:43, Marc Zyngier wrote: > When a guest gets scheduled, KVM performs a "load" operation, > which for the timer includes evaluating the virtual "active" state > of the interrupt, and replicating it on the physical side. This > ensures that the deactivation in the guest will also take place > in the physical GIC distributor. > > If the interrupt is not yet active, we flag it as inactive on the > physical side. This means that on restoring the timer registers, > if the timer has expired, we'll immediately take an interrupt. > That's absolutely fine, as the interrupt will then be flagged as > active on the physical side. What this assumes though is that we'll > enter the guest right after having taken the interrupt, and that > the guest will quickly ACK the interrupt, making it active at on Nit: "at on" -> pick one > the virtual side. > > It turns out that quite often, this assumption doesn't really hold. > The guest may be preempted on the back on this interrupt, either on the back of* > from kernel space or whilst running at EL1 when a host interrupt > fires. When this happens, we repeat the whole sequence on the > next load (interrupt marked as inactive, timer registers restored, > interrupt fires). And if it takes a really long time for a guest > to activate the interrupt (as it does with nested virt), we end-up > with many such events in quick succession, leading to the guest only > making very slow progress. > > This can also be seen with the number of virtual timer interrupt on the > host being far greater than the same number in the guest. > > An easy way to fix this is to evaluate the timer state when performing > the "load" operation, just like we do when the interrupt actually fires. > If the timer has a pending virtual interrupt at this stage, then we > can safely flag the physical interrupt as being active, which prevents > spurious exits. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Otherwise, I think the change makes sense: Reviewed-by: Julien Thierry <julien.thierry@arm.com> Cheers, > --- > virt/kvm/arm/arch_timer.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c > index 7449651ae2e5..70c18479ccd5 100644 > --- a/virt/kvm/arm/arch_timer.c > +++ b/virt/kvm/arm/arch_timer.c > @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo > static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx) > { > struct kvm_vcpu *vcpu = ctx->vcpu; > - bool phys_active; > + bool phys_active = false; > + > + /* > + * Update the timer output so that it is likely to match the > + * state we're about to restore. If the timer expires between > + * this point and the register restoration, we'll take the > + * interrupt anyway. > + */ > + kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx); > > if (irqchip_in_kernel(vcpu->kvm)) > phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq); > - else > - phys_active = ctx->irq.level; > + > + phys_active |= ctx->irq.level; > + > set_timer_irq_phys_active(ctx, phys_active); > } > >
On Fri, Feb 08, 2019 at 02:43:00PM +0000, Marc Zyngier wrote: > When a guest gets scheduled, KVM performs a "load" operation, > which for the timer includes evaluating the virtual "active" state > of the interrupt, and replicating it on the physical side. This > ensures that the deactivation in the guest will also take place > in the physical GIC distributor. > > If the interrupt is not yet active, we flag it as inactive on the > physical side. This means that on restoring the timer registers, > if the timer has expired, we'll immediately take an interrupt. > That's absolutely fine, as the interrupt will then be flagged as > active on the physical side. What this assumes though is that we'll > enter the guest right after having taken the interrupt, and that > the guest will quickly ACK the interrupt, making it active at on > the virtual side. > > It turns out that quite often, this assumption doesn't really hold. > The guest may be preempted on the back on this interrupt, either > from kernel space or whilst running at EL1 when a host interrupt > fires. When this happens, we repeat the whole sequence on the > next load (interrupt marked as inactive, timer registers restored, > interrupt fires). And if it takes a really long time for a guest > to activate the interrupt (as it does with nested virt), we end-up > with many such events in quick succession, leading to the guest only > making very slow progress. > > This can also be seen with the number of virtual timer interrupt on the > host being far greater than the same number in the guest. > > An easy way to fix this is to evaluate the timer state when performing > the "load" operation, just like we do when the interrupt actually fires. > If the timer has a pending virtual interrupt at this stage, then we > can safely flag the physical interrupt as being active, which prevents > spurious exits. > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> > --- > virt/kvm/arm/arch_timer.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c > index 7449651ae2e5..70c18479ccd5 100644 > --- a/virt/kvm/arm/arch_timer.c > +++ b/virt/kvm/arm/arch_timer.c > @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo > static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx) > { > struct kvm_vcpu *vcpu = ctx->vcpu; > - bool phys_active; > + bool phys_active = false; > + > + /* > + * Update the timer output so that it is likely to match the > + * state we're about to restore. If the timer expires between > + * this point and the register restoration, we'll take the > + * interrupt anyway. > + */ > + kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx); > > if (irqchip_in_kernel(vcpu->kvm)) > phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq); > - else > - phys_active = ctx->irq.level; > + > + phys_active |= ctx->irq.level; > + > set_timer_irq_phys_active(ctx, phys_active); > } > > -- > 2.20.1 > Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 7449651ae2e5..70c18479ccd5 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -487,12 +487,21 @@ static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, boo static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx) { struct kvm_vcpu *vcpu = ctx->vcpu; - bool phys_active; + bool phys_active = false; + + /* + * Update the timer output so that it is likely to match the + * state we're about to restore. If the timer expires between + * this point and the register restoration, we'll take the + * interrupt anyway. + */ + kvm_timer_update_irq(ctx->vcpu, kvm_timer_should_fire(ctx), ctx); if (irqchip_in_kernel(vcpu->kvm)) phys_active = kvm_vgic_map_is_active(vcpu, ctx->irq.irq); - else - phys_active = ctx->irq.level; + + phys_active |= ctx->irq.level; + set_timer_irq_phys_active(ctx, phys_active); }
When a guest gets scheduled, KVM performs a "load" operation, which for the timer includes evaluating the virtual "active" state of the interrupt, and replicating it on the physical side. This ensures that the deactivation in the guest will also take place in the physical GIC distributor. If the interrupt is not yet active, we flag it as inactive on the physical side. This means that on restoring the timer registers, if the timer has expired, we'll immediately take an interrupt. That's absolutely fine, as the interrupt will then be flagged as active on the physical side. What this assumes though is that we'll enter the guest right after having taken the interrupt, and that the guest will quickly ACK the interrupt, making it active at on the virtual side. It turns out that quite often, this assumption doesn't really hold. The guest may be preempted on the back on this interrupt, either from kernel space or whilst running at EL1 when a host interrupt fires. When this happens, we repeat the whole sequence on the next load (interrupt marked as inactive, timer registers restored, interrupt fires). And if it takes a really long time for a guest to activate the interrupt (as it does with nested virt), we end-up with many such events in quick succession, leading to the guest only making very slow progress. This can also be seen with the number of virtual timer interrupt on the host being far greater than the same number in the guest. An easy way to fix this is to evaluate the timer state when performing the "load" operation, just like we do when the interrupt actually fires. If the timer has a pending virtual interrupt at this stage, then we can safely flag the physical interrupt as being active, which prevents spurious exits. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> --- virt/kvm/arm/arch_timer.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)