diff mbox series

[2/3] KVM: arm64: nv: Emulate ISTATUS when emulated timers are fired.

Message ID 20220824060304.21128-3-gankulkarni@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: nv: Fixes for Nested Virtualization issues | expand

Commit Message

Ganapatrao Kulkarni Aug. 24, 2022, 6:03 a.m. UTC
Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
loaded timer.

For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
This results in the drop of interrupt from Guest-Hypervisor, where as
Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
are stuck in Idle thread and rcu soft lockups are seen.

This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
trap handler is emulating the ISTATUS bit.

Adding code to set/emulate the ISTATUS when the emulated timers are fired.

Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
 arch/arm64/kvm/arch_timer.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Marc Zyngier Dec. 29, 2022, 1:53 p.m. UTC | #1
On Wed, 24 Aug 2022 07:03:03 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> loaded timer.
> 
> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> This results in the drop of interrupt from Guest-Hypervisor, where as
> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> are stuck in Idle thread and rcu soft lockups are seen.
> 
> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> trap handler is emulating the ISTATUS bit.
> 
> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> 
> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> ---
>  arch/arm64/kvm/arch_timer.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 27a6ec46803a..0b32d943d2d5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer,
>  			      enum kvm_arch_timer_regs treg);
>  static bool kvm_arch_timer_get_input_level(int vintid);
> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>  
>  static struct irq_ops arch_timer_irq_ops = {
>  	.get_input_level = kvm_arch_timer_get_input_level,
> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
>  		return HRTIMER_RESTART;
>  	}
>  
> +	/* Timer emulated, emulate ISTATUS also */
> +	timer_set_ctl(ctx, read_timer_ctl(ctx));

Why should we do that for non-NV2 configurations?

>  	kvm_timer_update_irq(vcpu, true, ctx);
>  	return HRTIMER_NORESTART;
>  }
> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>  	trace_kvm_timer_emulate(ctx, should_fire);
>  
>  	if (should_fire != ctx->irq.level) {
> +		/* Timer emulated, emulate ISTATUS also */
> +		timer_set_ctl(ctx, read_timer_ctl(ctx));
>  		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>  		return;
>  	}

I'm not overly keen on this. Yes, we can set the status bit there. But
conversely, the bit will not get cleared when the guest reprograms the
timer, and will take a full exit/entry cycle for it to appear.

Ergo, the architecture is buggy as memory (the VNCR page) cannot be
used to emulate something as dynamic as a timer.

It is only with FEAT_ECV that we can solve this correctly by trapping
the counter/timer accesses and emulate them for the guest hypervisor.
I'd rather we add support for that, as I expect all the FEAT_NV2
implementations to have it (and hopefully FEAT_FGT as well).

Thanks,

	M.
Marc Zyngier Jan. 2, 2023, 11:46 a.m. UTC | #2
On Thu, 29 Dec 2022 13:53:15 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Wed, 24 Aug 2022 07:03:03 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> > 
> > Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> > enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> > loaded timer.
> > 
> > For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> > bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> > This results in the drop of interrupt from Guest-Hypervisor, where as
> > Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> > to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> > are stuck in Idle thread and rcu soft lockups are seen.
> > 
> > This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> > trap handler is emulating the ISTATUS bit.
> > 
> > Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> > 
> > Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> > ---
> >  arch/arm64/kvm/arch_timer.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 27a6ec46803a..0b32d943d2d5 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> >  			      struct arch_timer_context *timer,
> >  			      enum kvm_arch_timer_regs treg);
> >  static bool kvm_arch_timer_get_input_level(int vintid);
> > +static u64 read_timer_ctl(struct arch_timer_context *timer);
> >  
> >  static struct irq_ops arch_timer_irq_ops = {
> >  	.get_input_level = kvm_arch_timer_get_input_level,
> > @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
> >  		return HRTIMER_RESTART;
> >  	}
> >  
> > +	/* Timer emulated, emulate ISTATUS also */
> > +	timer_set_ctl(ctx, read_timer_ctl(ctx));
> 
> Why should we do that for non-NV2 configurations?
> 
> >  	kvm_timer_update_irq(vcpu, true, ctx);
> >  	return HRTIMER_NORESTART;
> >  }
> > @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
> >  	trace_kvm_timer_emulate(ctx, should_fire);
> >  
> >  	if (should_fire != ctx->irq.level) {
> > +		/* Timer emulated, emulate ISTATUS also */
> > +		timer_set_ctl(ctx, read_timer_ctl(ctx));
> >  		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
> >  		return;
> >  	}
> 
> I'm not overly keen on this. Yes, we can set the status bit there. But
> conversely, the bit will not get cleared when the guest reprograms the
> timer, and will take a full exit/entry cycle for it to appear.
> 
> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
> used to emulate something as dynamic as a timer.
> 
> It is only with FEAT_ECV that we can solve this correctly by trapping
> the counter/timer accesses and emulate them for the guest hypervisor.
> I'd rather we add support for that, as I expect all the FEAT_NV2
> implementations to have it (and hopefully FEAT_FGT as well).

So I went ahead and implemented some very basic FEAT_ECV support to
correctly emulate the timers (trapping the CTL/CVAL accesses).

Performance dropped like a rock (~30% extra overhead) for L2
exit-heavy workloads that are terminated in userspace, such as virtio.
For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
as we save/restore the timer context, and this is enough to make
things visibly slower, even on a pretty fast machine.

I managed to get *some* performance back by satisfying CTL/CVAL reads
very early on the exit path (a pretty common theme with NV). Which
means we end-up needing something like what you have -- only a bit
more complete. I came up with the following:

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 4945c5b96f05..a198a6211e2a 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 {
 	int ret;
 
+	/*
+	 * Paper over NV2 brokenness by publishing the interrupt status
+	 * bit. This still results in a poor quality of emulation (guest
+	 * writes will have no effect until the next exit).
+	 *
+	 * But hey, it's fast, right?
+	 */
+	if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
+	    (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
+		u32 ctl = timer_get_ctl(timer_ctx);
+
+		if (new_level)
+			ctl |= ARCH_TIMER_CTRL_IT_STAT;
+		else
+			ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
+
+		timer_set_ctl(timer_ctx, ctl);
+	}
+
 	timer_ctx->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
 				   timer_ctx->irq.level);

which reports the interrupt state in all cases.

Does this work for you?

Thanks,

	M.
Ganapatrao Kulkarni Jan. 3, 2023, 4:21 a.m. UTC | #3
On 02-01-2023 05:16 pm, Marc Zyngier wrote:
> On Thu, 29 Dec 2022 13:53:15 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Wed, 24 Aug 2022 07:03:03 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
>>> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
>>> loaded timer.
>>>
>>> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
>>> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
>>> This results in the drop of interrupt from Guest-Hypervisor, where as
>>> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
>>> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
>>> are stuck in Idle thread and rcu soft lockups are seen.
>>>
>>> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
>>> trap handler is emulating the ISTATUS bit.
>>>
>>> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>>> ---
>>>   arch/arm64/kvm/arch_timer.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>>> index 27a6ec46803a..0b32d943d2d5 100644
>>> --- a/arch/arm64/kvm/arch_timer.c
>>> +++ b/arch/arm64/kvm/arch_timer.c
>>> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>>>   			      struct arch_timer_context *timer,
>>>   			      enum kvm_arch_timer_regs treg);
>>>   static bool kvm_arch_timer_get_input_level(int vintid);
>>> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>>>   
>>>   static struct irq_ops arch_timer_irq_ops = {
>>>   	.get_input_level = kvm_arch_timer_get_input_level,
>>> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
>>>   		return HRTIMER_RESTART;
>>>   	}
>>>   
>>> +	/* Timer emulated, emulate ISTATUS also */
>>> +	timer_set_ctl(ctx, read_timer_ctl(ctx));
>>
>> Why should we do that for non-NV2 configurations?
>>
>>>   	kvm_timer_update_irq(vcpu, true, ctx);
>>>   	return HRTIMER_NORESTART;
>>>   }
>>> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>>>   	trace_kvm_timer_emulate(ctx, should_fire);
>>>   
>>>   	if (should_fire != ctx->irq.level) {
>>> +		/* Timer emulated, emulate ISTATUS also */
>>> +		timer_set_ctl(ctx, read_timer_ctl(ctx));
>>>   		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>>>   		return;
>>>   	}
>>
>> I'm not overly keen on this. Yes, we can set the status bit there. But
>> conversely, the bit will not get cleared when the guest reprograms the
>> timer, and will take a full exit/entry cycle for it to appear.
>>
>> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
>> used to emulate something as dynamic as a timer.
>>
>> It is only with FEAT_ECV that we can solve this correctly by trapping
>> the counter/timer accesses and emulate them for the guest hypervisor.
>> I'd rather we add support for that, as I expect all the FEAT_NV2
>> implementations to have it (and hopefully FEAT_FGT as well).
> 
> So I went ahead and implemented some very basic FEAT_ECV support to
> correctly emulate the timers (trapping the CTL/CVAL accesses).
> 
> Performance dropped like a rock (~30% extra overhead) for L2
> exit-heavy workloads that are terminated in userspace, such as virtio.
> For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
> as we save/restore the timer context, and this is enough to make
> things visibly slower, even on a pretty fast machine.
> 
> I managed to get *some* performance back by satisfying CTL/CVAL reads
> very early on the exit path (a pretty common theme with NV). Which
> means we end-up needing something like what you have -- only a bit
> more complete. I came up with the following:
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 4945c5b96f05..a198a6211e2a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>   {
>   	int ret;
>   
> +	/*
> +	 * Paper over NV2 brokenness by publishing the interrupt status
> +	 * bit. This still results in a poor quality of emulation (guest
> +	 * writes will have no effect until the next exit).
> +	 *
> +	 * But hey, it's fast, right?
> +	 */
> +	if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
> +	    (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
> +		u32 ctl = timer_get_ctl(timer_ctx);
> +
> +		if (new_level)
> +			ctl |= ARCH_TIMER_CTRL_IT_STAT;
> +		else
> +			ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
> +
> +		timer_set_ctl(timer_ctx, ctl);
> +	}
> +
>   	timer_ctx->irq.level = new_level;
>   	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
>   				   timer_ctx->irq.level);
> 
> which reports the interrupt state in all cases.
> 
> Does this work for you?

Thanks Marc for the patch. I will try this and update at the earliest.

> 
> Thanks,
> 
> 	M.
> 

Thanks,
Ganapat
Ganapatrao Kulkarni Jan. 10, 2023, 8:41 a.m. UTC | #4
On 02-01-2023 05:16 pm, Marc Zyngier wrote:
> On Thu, 29 Dec 2022 13:53:15 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Wed, 24 Aug 2022 07:03:03 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
>>> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
>>> loaded timer.
>>>
>>> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
>>> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
>>> This results in the drop of interrupt from Guest-Hypervisor, where as
>>> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
>>> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
>>> are stuck in Idle thread and rcu soft lockups are seen.
>>>
>>> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
>>> trap handler is emulating the ISTATUS bit.
>>>
>>> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
>>> ---
>>>   arch/arm64/kvm/arch_timer.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>>> index 27a6ec46803a..0b32d943d2d5 100644
>>> --- a/arch/arm64/kvm/arch_timer.c
>>> +++ b/arch/arm64/kvm/arch_timer.c
>>> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>>>   			      struct arch_timer_context *timer,
>>>   			      enum kvm_arch_timer_regs treg);
>>>   static bool kvm_arch_timer_get_input_level(int vintid);
>>> +static u64 read_timer_ctl(struct arch_timer_context *timer);
>>>   
>>>   static struct irq_ops arch_timer_irq_ops = {
>>>   	.get_input_level = kvm_arch_timer_get_input_level,
>>> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
>>>   		return HRTIMER_RESTART;
>>>   	}
>>>   
>>> +	/* Timer emulated, emulate ISTATUS also */
>>> +	timer_set_ctl(ctx, read_timer_ctl(ctx));
>>
>> Why should we do that for non-NV2 configurations?
>>
>>>   	kvm_timer_update_irq(vcpu, true, ctx);
>>>   	return HRTIMER_NORESTART;
>>>   }
>>> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
>>>   	trace_kvm_timer_emulate(ctx, should_fire);
>>>   
>>>   	if (should_fire != ctx->irq.level) {
>>> +		/* Timer emulated, emulate ISTATUS also */
>>> +		timer_set_ctl(ctx, read_timer_ctl(ctx));
>>>   		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
>>>   		return;
>>>   	}
>>
>> I'm not overly keen on this. Yes, we can set the status bit there. But
>> conversely, the bit will not get cleared when the guest reprograms the
>> timer, and will take a full exit/entry cycle for it to appear.
>>
>> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
>> used to emulate something as dynamic as a timer.
>>
>> It is only with FEAT_ECV that we can solve this correctly by trapping
>> the counter/timer accesses and emulate them for the guest hypervisor.
>> I'd rather we add support for that, as I expect all the FEAT_NV2
>> implementations to have it (and hopefully FEAT_FGT as well).
> 
> So I went ahead and implemented some very basic FEAT_ECV support to
> correctly emulate the timers (trapping the CTL/CVAL accesses).
> 
> Performance dropped like a rock (~30% extra overhead) for L2
> exit-heavy workloads that are terminated in userspace, such as virtio.
> For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
> as we save/restore the timer context, and this is enough to make
> things visibly slower, even on a pretty fast machine.
> 
> I managed to get *some* performance back by satisfying CTL/CVAL reads
> very early on the exit path (a pretty common theme with NV). Which
> means we end-up needing something like what you have -- only a bit
> more complete. I came up with the following:

Yes it is more appropriate, this moves ISTATUS update to single place.
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 4945c5b96f05..a198a6211e2a 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>   {
>   	int ret;
>   
> +	/*
> +	 * Paper over NV2 brokenness by publishing the interrupt status
> +	 * bit. This still results in a poor quality of emulation (guest
> +	 * writes will have no effect until the next exit).
> +	 *
> +	 * But hey, it's fast, right?
> +	 */
> +	if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
> +	    (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
> +		u32 ctl = timer_get_ctl(timer_ctx);
> +
> +		if (new_level)
> +			ctl |= ARCH_TIMER_CTRL_IT_STAT;
> +		else
> +			ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
> +
> +		timer_set_ctl(timer_ctx, ctl);
> +	}
> +
>   	timer_ctx->irq.level = new_level;
>   	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
>   				   timer_ctx->irq.level);
> 
> which reports the interrupt state in all cases.
> 
> Does this work for you?

This works.
Are you going to pull this diff/patch in to your 6.2-nv tree? or you 
want me to send an updated patch?

> 
> Thanks,
> 
> 	M.
> 

Thanks,
Ganapat
Marc Zyngier Jan. 10, 2023, 10:46 a.m. UTC | #5
On Tue, 10 Jan 2023 08:41:44 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> On 02-01-2023 05:16 pm, Marc Zyngier wrote:
> > On Thu, 29 Dec 2022 13:53:15 +0000,
> > Marc Zyngier <maz@kernel.org> wrote:
> >> 
> >> On Wed, 24 Aug 2022 07:03:03 +0100,
> >> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >>> 
> >>> Guest-Hypervisor forwards the timer interrupt to Guest-Guest, if it is
> >>> enabled, unmasked and ISTATUS bit of register CNTV_CTL_EL0 is set for a
> >>> loaded timer.
> >>> 
> >>> For NV2 implementation, the Host-Hypervisor is not emulating the ISTATUS
> >>> bit while forwarding the Emulated Vtimer Interrupt to Guest-Hypervisor.
> >>> This results in the drop of interrupt from Guest-Hypervisor, where as
> >>> Host Hypervisor marked it as an active interrupt and expecting Guest-Guest
> >>> to consume and acknowledge. Due to this, some of the Guest-Guest vCPUs
> >>> are stuck in Idle thread and rcu soft lockups are seen.
> >>> 
> >>> This issue is not seen with NV1 case since the register CNTV_CTL_EL0 read
> >>> trap handler is emulating the ISTATUS bit.
> >>> 
> >>> Adding code to set/emulate the ISTATUS when the emulated timers are fired.
> >>> 
> >>> Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> >>> ---
> >>>   arch/arm64/kvm/arch_timer.c | 5 +++++
> >>>   1 file changed, 5 insertions(+)
> >>> 
> >>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> >>> index 27a6ec46803a..0b32d943d2d5 100644
> >>> --- a/arch/arm64/kvm/arch_timer.c
> >>> +++ b/arch/arm64/kvm/arch_timer.c
> >>> @@ -63,6 +63,7 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
> >>>   			      struct arch_timer_context *timer,
> >>>   			      enum kvm_arch_timer_regs treg);
> >>>   static bool kvm_arch_timer_get_input_level(int vintid);
> >>> +static u64 read_timer_ctl(struct arch_timer_context *timer);
> >>>     static struct irq_ops arch_timer_irq_ops = {
> >>>   	.get_input_level = kvm_arch_timer_get_input_level,
> >>> @@ -356,6 +357,8 @@ static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
> >>>   		return HRTIMER_RESTART;
> >>>   	}
> >>>   +	/* Timer emulated, emulate ISTATUS also */
> >>> +	timer_set_ctl(ctx, read_timer_ctl(ctx));
> >> 
> >> Why should we do that for non-NV2 configurations?
> >> 
> >>>   	kvm_timer_update_irq(vcpu, true, ctx);
> >>>   	return HRTIMER_NORESTART;
> >>>   }
> >>> @@ -458,6 +461,8 @@ static void timer_emulate(struct arch_timer_context *ctx)
> >>>   	trace_kvm_timer_emulate(ctx, should_fire);
> >>>     	if (should_fire != ctx->irq.level) {
> >>> +		/* Timer emulated, emulate ISTATUS also */
> >>> +		timer_set_ctl(ctx, read_timer_ctl(ctx));
> >>>   		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
> >>>   		return;
> >>>   	}
> >> 
> >> I'm not overly keen on this. Yes, we can set the status bit there. But
> >> conversely, the bit will not get cleared when the guest reprograms the
> >> timer, and will take a full exit/entry cycle for it to appear.
> >> 
> >> Ergo, the architecture is buggy as memory (the VNCR page) cannot be
> >> used to emulate something as dynamic as a timer.
> >> 
> >> It is only with FEAT_ECV that we can solve this correctly by trapping
> >> the counter/timer accesses and emulate them for the guest hypervisor.
> >> I'd rather we add support for that, as I expect all the FEAT_NV2
> >> implementations to have it (and hopefully FEAT_FGT as well).
> > 
> > So I went ahead and implemented some very basic FEAT_ECV support to
> > correctly emulate the timers (trapping the CTL/CVAL accesses).
> > 
> > Performance dropped like a rock (~30% extra overhead) for L2
> > exit-heavy workloads that are terminated in userspace, such as virtio.
> > For those workloads, vcpu_{load,put}() in L1 now generate extra traps,
> > as we save/restore the timer context, and this is enough to make
> > things visibly slower, even on a pretty fast machine.
> > 
> > I managed to get *some* performance back by satisfying CTL/CVAL reads
> > very early on the exit path (a pretty common theme with NV). Which
> > means we end-up needing something like what you have -- only a bit
> > more complete. I came up with the following:
> 
> Yes it is more appropriate, this moves ISTATUS update to single place.
> > 
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 4945c5b96f05..a198a6211e2a 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -450,6 +450,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> >   {
> >   	int ret;
> >   +	/*
> > +	 * Paper over NV2 brokenness by publishing the interrupt status
> > +	 * bit. This still results in a poor quality of emulation (guest
> > +	 * writes will have no effect until the next exit).
> > +	 *
> > +	 * But hey, it's fast, right?
> > +	 */
> > +	if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
> > +	    (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
> > +		u32 ctl = timer_get_ctl(timer_ctx);
> > +
> > +		if (new_level)
> > +			ctl |= ARCH_TIMER_CTRL_IT_STAT;
> > +		else
> > +			ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
> > +
> > +		timer_set_ctl(timer_ctx, ctl);
> > +	}
> > +
> >   	timer_ctx->irq.level = new_level;
> >   	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> >   				   timer_ctx->irq.level);
> > 
> > which reports the interrupt state in all cases.
> > 
> > Does this work for you?
> 
> This works.
> Are you going to pull this diff/patch in to your 6.2-nv tree? or you
> want me to send an updated patch?

I already have this in the patch titled:

KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state

and the result gets used by:

KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on

(not pasting the SHA1s as I'm still fixing a few nits here and there,
and the commit IDs will change).

Thanks,

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 27a6ec46803a..0b32d943d2d5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -63,6 +63,7 @@  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
 static bool kvm_arch_timer_get_input_level(int vintid);
+static u64 read_timer_ctl(struct arch_timer_context *timer);
 
 static struct irq_ops arch_timer_irq_ops = {
 	.get_input_level = kvm_arch_timer_get_input_level,
@@ -356,6 +357,8 @@  static enum hrtimer_restart kvm_hrtimer_expire(struct hrtimer *hrt)
 		return HRTIMER_RESTART;
 	}
 
+	/* Timer emulated, emulate ISTATUS also */
+	timer_set_ctl(ctx, read_timer_ctl(ctx));
 	kvm_timer_update_irq(vcpu, true, ctx);
 	return HRTIMER_NORESTART;
 }
@@ -458,6 +461,8 @@  static void timer_emulate(struct arch_timer_context *ctx)
 	trace_kvm_timer_emulate(ctx, should_fire);
 
 	if (should_fire != ctx->irq.level) {
+		/* Timer emulated, emulate ISTATUS also */
+		timer_set_ctl(ctx, read_timer_ctl(ctx));
 		kvm_timer_update_irq(ctx->vcpu, should_fire, ctx);
 		return;
 	}