diff mbox series

[2/2] KVM: arm64: timers: Adjust CVAL of a ptimer across guest entry and exits

Message ID 20230817060314.535987-3-gankulkarni@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series Avoid spurious ptimer interrupts for non-zero cntpoff | expand

Commit Message

Ganapatrao Kulkarni Aug. 17, 2023, 6:03 a.m. UTC
As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
as zero. On VHE host, E2H and TGE are set, hence it is required
to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
exit to avoid false physical timer interrupts and also
decrement/restore CVAL before the guest entry.

Signed-off-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
---
 arch/arm64/kvm/arch_timer.c     | 32 ++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c | 13 +++++++++++++
 include/kvm/arm_arch_timer.h    |  1 +
 3 files changed, 46 insertions(+)

Comments

Marc Zyngier Aug. 17, 2023, 8:27 a.m. UTC | #1
[Fixing Christoffer's email address]

On Thu, 17 Aug 2023 07:03:14 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
> Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
> as zero. On VHE host, E2H and TGE are set, hence it is required
> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
> exit to avoid false physical timer interrupts and also
> decrement/restore CVAL before the guest entry.

No, this is wrong. Neither E2H nor TGE have any impact on writing to
CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
the pseudocode to convince yourself.

CNTPOFF_EL2 is applied at exactly two points: when SW is reading
CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
is subtracted from the counter. And that's the point where the running
EL matters. Which means that CNTPOFF_EL2 behaves exactly like
CNTVOFF_EL2. No ifs, no buts.

The behaviour you are describing tends to indicate that your HW is
applying CNTPOFF_EL2 to CVAL, which would be a gold plated bug.

It doesn't mean that the KVM code is correct either. I'm seeing pretty
bad behaviours on a machine without CNTPOFF_EL2. But what you are
suggesting is IMO a misunderstanding of the architecture.

Thanks,

	M.
Ganapatrao Kulkarni Aug. 17, 2023, 9:27 a.m. UTC | #2
Hi Marc,

On 17-08-2023 01:57 pm, Marc Zyngier wrote:
> [Fixing Christoffer's email address]

Thanks.
> 
> On Thu, 17 Aug 2023 07:03:14 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
>> Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
>> as zero. On VHE host, E2H and TGE are set, hence it is required
>> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
>> exit to avoid false physical timer interrupts and also
>> decrement/restore CVAL before the guest entry.
> 
> No, this is wrong. Neither E2H nor TGE have any impact on writing to
> CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
> the pseudocode to convince yourself.
> 
> CNTPOFF_EL2 is applied at exactly two points: when SW is reading
> CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
> comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
> is subtracted from the counter. And that's the point where the running
> EL matters. Which means that CNTPOFF_EL2 behaves exactly like
> CNTVOFF_EL2. No ifs, no buts.

As per ARM ARM (ARM DDI 0487J.a page D11-5989)
"When FEAT_ECV is implemented, the CNTPOFF_EL2 register allows an offset 
to be applied to the physical counter, as viewed from EL1 and EL0, and 
to the EL1 physical timer. The functionality of this 64-bit register is 
affected by CNTHCTL_EL2.ECV."

As per ARM ARM (ARM DDI 0487J.a page D19-7857)
"When HCR_EL2.{E2H, TGE} == {1, 1} or SCR_EL3.{NS, EEL2} == {0, 0}, then
Enhanced Counter Virtualization functionality is disabled."

"The EL1 physical timer interrupt is triggered when ((PCount<63:0> -
CNTPOFF_EL2<63:0>) - PCVal<63:0>) is greater than or equal to 0."

As per ARM ARM (ARM DDI 0487J.a page D19-7938)
"When EL2 is implemented and enabled in the current Security state, the 
physical counter uses a fixed physical offset of *zero* if any of the 
following are true:
• CNTHCTL_EL2.ECV is 0.
• SCR_EL3.ECVEn is 0.
• HCR_EL2.{E2H, TGE} is {1, 1}."

In VHE host hypervisor, E2H=TGE=1 hence ECV is disabled and Ptimer 
interrupt is triggered based on PCount<63:0> - PCVal<63:0>

Since cval is set by Guest as per offsetted PCounter value and pCount is 
not subtracted by CNTPOFF when in VHE-L0, results in cval becoming much 
lesser than physical counter(bumped up since CNTPOFF is zero) and timer 
interrupt trigger condition is met falsely.

There is no issue/impact on cval due to ECV, however it can be/is 
manipulated to handle this on and off of CNTPOFF/ECV.

IIUC, CNTPOFF and CNTVOFF are not same as per specification.
> 
> The behaviour you are describing tends to indicate that your HW is
> applying CNTPOFF_EL2 to CVAL, which would be a gold plated bug.
> 
> It doesn't mean that the KVM code is correct either. I'm seeing pretty
> bad behaviours on a machine without CNTPOFF_EL2. But what you are
> suggesting is IMO a misunderstanding of the architecture.
> 
> Thanks,
> 
> 	M.
>

Thanks,
Ganapat
Marc Zyngier Aug. 17, 2023, 10:04 a.m. UTC | #3
On Thu, 17 Aug 2023 10:27:55 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 17-08-2023 01:57 pm, Marc Zyngier wrote:
> > [Fixing Christoffer's email address]
> 
> Thanks.
> > 
> > On Thu, 17 Aug 2023 07:03:14 +0100,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
> >> Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
> >> as zero. On VHE host, E2H and TGE are set, hence it is required
> >> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
> >> exit to avoid false physical timer interrupts and also
> >> decrement/restore CVAL before the guest entry.
> > 
> > No, this is wrong. Neither E2H nor TGE have any impact on writing to
> > CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
> > the pseudocode to convince yourself.
> > 
> > CNTPOFF_EL2 is applied at exactly two points: when SW is reading
> > CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
> > comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
> > is subtracted from the counter. And that's the point where the running
> > EL matters. Which means that CNTPOFF_EL2 behaves exactly like
> > CNTVOFF_EL2. No ifs, no buts.
> 
> As per ARM ARM (ARM DDI 0487J.a page D11-5989)
> "When FEAT_ECV is implemented, the CNTPOFF_EL2 register allows an
> offset to be applied to the physical counter, as viewed from EL1 and
> EL0, and to the EL1 physical timer. The functionality of this 64-bit
> register is affected by CNTHCTL_EL2.ECV."
> 
> As per ARM ARM (ARM DDI 0487J.a page D19-7857)
> "When HCR_EL2.{E2H, TGE} == {1, 1} or SCR_EL3.{NS, EEL2} == {0, 0}, then
> Enhanced Counter Virtualization functionality is disabled."

I think this is just bad writing. The functionality isn't disabled,
since you can access CNTPOFF_EL2. It is just that, as for CNTVOFF, the
offset isn't applied to reads of the physical counter.

> 
> "The EL1 physical timer interrupt is triggered when ((PCount<63:0> -
> CNTPOFF_EL2<63:0>) - PCVal<63:0>) is greater than or equal to 0."
> 
> As per ARM ARM (ARM DDI 0487J.a page D19-7938)
> "When EL2 is implemented and enabled in the current Security state,
> the physical counter uses a fixed physical offset of *zero* if any of
> the following are true:
> • CNTHCTL_EL2.ECV is 0.
> • SCR_EL3.ECVEn is 0.
> • HCR_EL2.{E2H, TGE} is {1, 1}."
> 
> In VHE host hypervisor, E2H=TGE=1 hence ECV is disabled and Ptimer
> interrupt is triggered based on PCount<63:0> - PCVal<63:0>

ECV is *not* disabled. It is just that CNTPOFF isn't applied to the
counter as viewed from EL2.

> 
> Since cval is set by Guest as per offsetted PCounter value and pCount
> is not subtracted by CNTPOFF when in VHE-L0, results in cval becoming
> much lesser than physical counter(bumped up since CNTPOFF is zero) and
> timer interrupt trigger condition is met falsely.

Again, you're misreading the spec. When the guest reads the counter,
it reads something that is offset by CNTPOFF. When the guest programs
CVAL, it is an offset from the value it has read. When the HW compares
the physical counter to CVAL, it applies CNTPOFF to the counter.

There is no context to these things, and the value of TGE should not
matter. And even if it did, at the point where we enter the guest,
everything should be hunky dory: the HW has all the information, and
no interrupt should fire if the offseted counter is less than the
deadline.

Again, this is exactly like CNTVOFF.

> There is no issue/impact on cval due to ECV, however it can be/is
> manipulated to handle this on and off of CNTPOFF/ECV.
> 
> IIUC, CNTPOFF and CNTVOFF are not same as per specification.

We clearly are reading different things in the spec, as I cannot see
*any* difference in how they are described. If your HW has built them
differently, you have a problem.

Thanks,

	M.
Marc Zyngier Aug. 22, 2023, 12:16 p.m. UTC | #4
On Thu, 17 Aug 2023 10:27:55 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> On 17-08-2023 01:57 pm, Marc Zyngier wrote:
> > [Fixing Christoffer's email address]
> 
> Thanks.
> > 
> > On Thu, 17 Aug 2023 07:03:14 +0100,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
> >> Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
> >> as zero. On VHE host, E2H and TGE are set, hence it is required
> >> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
> >> exit to avoid false physical timer interrupts and also
> >> decrement/restore CVAL before the guest entry.
> > 
> > No, this is wrong. Neither E2H nor TGE have any impact on writing to
> > CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
> > the pseudocode to convince yourself.
> > 
> > CNTPOFF_EL2 is applied at exactly two points: when SW is reading
> > CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
> > comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
> > is subtracted from the counter. And that's the point where the running
> > EL matters. Which means that CNTPOFF_EL2 behaves exactly like
> > CNTVOFF_EL2. No ifs, no buts.
> 
> As per ARM ARM (ARM DDI 0487J.a page D11-5989)
> "When FEAT_ECV is implemented, the CNTPOFF_EL2 register allows an
> offset to be applied to the physical counter, as viewed from EL1 and
> EL0, and to the EL1 physical timer. The functionality of this 64-bit
> register is affected by CNTHCTL_EL2.ECV."
> 
> As per ARM ARM (ARM DDI 0487J.a page D19-7857)
> "When HCR_EL2.{E2H, TGE} == {1, 1} or SCR_EL3.{NS, EEL2} == {0, 0}, then
> Enhanced Counter Virtualization functionality is disabled."
> 
> "The EL1 physical timer interrupt is triggered when ((PCount<63:0> -
> CNTPOFF_EL2<63:0>) - PCVal<63:0>) is greater than or equal to 0."
> 
> As per ARM ARM (ARM DDI 0487J.a page D19-7938)
> "When EL2 is implemented and enabled in the current Security state,
> the physical counter uses a fixed physical offset of *zero* if any of
> the following are true:
> • CNTHCTL_EL2.ECV is 0.
> • SCR_EL3.ECVEn is 0.
> • HCR_EL2.{E2H, TGE} is {1, 1}."
> 
> In VHE host hypervisor, E2H=TGE=1 hence ECV is disabled and Ptimer
> interrupt is triggered based on PCount<63:0> - PCVal<63:0>
> 
> Since cval is set by Guest as per offsetted PCounter value and pCount
> is not subtracted by CNTPOFF when in VHE-L0, results in cval becoming
> much lesser than physical counter(bumped up since CNTPOFF is zero) and
> timer interrupt trigger condition is met falsely.
> 
> There is no issue/impact on cval due to ECV, however it can be/is
> manipulated to handle this on and off of CNTPOFF/ECV.
> 
> IIUC, CNTPOFF and CNTVOFF are not same as per specification.

I owe you an apology. You are correct, and I was totally wrong.  I'm
truly amazed how wrong we got this part of the architecture, but it is
way too late for any change, and we'll have to live with it.

Now, to the actual patch: I think the way you offset CVAL isn't
great. You should never have to change it on entry, and you should
instead read the correct value from memory. Then, save/restore of CVAL
must be amended to always apply the offset. Can you give the hack
below a go on your HW?

Thanks,

	M.

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index ea46b4e1e7a8..bb80fdd84676 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -55,11 +55,6 @@ static struct irq_ops arch_timer_irq_ops = {
 	.get_input_level = kvm_arch_timer_get_input_level,
 };
 
-static bool has_cntpoff(void)
-{
-	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
-}
-
 static int nr_timers(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu_has_nv2(vcpu))
@@ -180,7 +175,7 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
+void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
 	if (vcpu_has_nv2(vcpu)) {
 		if (is_hyp_ctxt(vcpu)) {
@@ -569,8 +564,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
 		cval = read_sysreg_el0(SYS_CNTP_CVAL);
 
-		if (!has_cntpoff())
-			cval -= timer_get_offset(ctx);
+		cval -= timer_get_offset(ctx);
 
 		timer_set_cval(ctx, cval);
 
@@ -657,8 +651,7 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 		cval = timer_get_cval(ctx);
 		offset = timer_get_offset(ctx);
 		set_cntpoff(offset);
-		if (!has_cntpoff())
-			cval += offset;
+		cval += offset;
 		write_sysreg_el0(cval, SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 9611b4eaf661..6e3d3e16563f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -90,6 +90,20 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 	___activate_traps(vcpu, hcr);
 
+	if (has_cntpoff()) {
+		struct timer_map map;
+
+		get_timer_map(vcpu, &map);
+
+		if (map.direct_ptimer == vcpu_ptimer(vcpu))
+			val = __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+		else
+			val = __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
+
+		isb();
+		write_sysreg_s(val, SYS_CNTP_CVAL_EL0);
+	}
+
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_ELx_TTA;
 	val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
@@ -131,6 +145,23 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
 
+	if (has_cntpoff()) {
+		struct timer_map map;
+		u64 val, offset;
+
+		get_timer_map(vcpu, &map);
+
+		val = read_sysreg_s(SYS_CNTP_CVAL_EL0);
+		if (map.direct_ptimer == vcpu_ptimer(vcpu))
+			__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = val;
+		else
+			__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = val;
+
+		offset = read_sysreg_s(SYS_CNTPOFF_EL2);
+		write_sysreg_s(val + offset, SYS_CNTP_CVAL_EL0);
+		isb();
+	}
+
 	/*
 	 * ARM errata 1165522 and 1530923 require the actual execution of the
 	 * above before we can switch to the EL2/EL0 translation regime used by
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ea77a569a907..86a73ad1446a 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -82,6 +82,8 @@ struct timer_map {
 	struct arch_timer_context *emul_ptimer;
 };
 
+void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map);
+
 struct arch_timer_cpu {
 	struct arch_timer_context timers[NR_KVM_TIMERS];
 
@@ -149,4 +151,9 @@ void kvm_timer_cpu_down(void);
 /* CNTKCTL_EL1 valid bits as of DDI0476J.a */
 #define CNTKCTL_VALID_BITS	(BIT(17) | GENMASK_ULL(9, 0))
 
+static inline bool has_cntpoff(void)
+{
+	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
+}
+
 #endif
Ganapatrao Kulkarni Aug. 22, 2023, 2:57 p.m. UTC | #5
On 22-08-2023 05:46 pm, Marc Zyngier wrote:
> On Thu, 17 Aug 2023 10:27:55 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>> Hi Marc,
>>
>> On 17-08-2023 01:57 pm, Marc Zyngier wrote:
>>> [Fixing Christoffer's email address]
>>
>> Thanks.
>>>
>>> On Thu, 17 Aug 2023 07:03:14 +0100,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
>>>> Virtualization functionality is disabled and CNTPOFF_EL2 value is treated
>>>> as zero. On VHE host, E2H and TGE are set, hence it is required
>>>> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
>>>> exit to avoid false physical timer interrupts and also
>>>> decrement/restore CVAL before the guest entry.
>>>
>>> No, this is wrong. Neither E2H nor TGE have any impact on writing to
>>> CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
>>> the pseudocode to convince yourself.
>>>
>>> CNTPOFF_EL2 is applied at exactly two points: when SW is reading
>>> CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
>>> comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
>>> is subtracted from the counter. And that's the point where the running
>>> EL matters. Which means that CNTPOFF_EL2 behaves exactly like
>>> CNTVOFF_EL2. No ifs, no buts.
>>
>> As per ARM ARM (ARM DDI 0487J.a page D11-5989)
>> "When FEAT_ECV is implemented, the CNTPOFF_EL2 register allows an
>> offset to be applied to the physical counter, as viewed from EL1 and
>> EL0, and to the EL1 physical timer. The functionality of this 64-bit
>> register is affected by CNTHCTL_EL2.ECV."
>>
>> As per ARM ARM (ARM DDI 0487J.a page D19-7857)
>> "When HCR_EL2.{E2H, TGE} == {1, 1} or SCR_EL3.{NS, EEL2} == {0, 0}, then
>> Enhanced Counter Virtualization functionality is disabled."
>>
>> "The EL1 physical timer interrupt is triggered when ((PCount<63:0> -
>> CNTPOFF_EL2<63:0>) - PCVal<63:0>) is greater than or equal to 0."
>>
>> As per ARM ARM (ARM DDI 0487J.a page D19-7938)
>> "When EL2 is implemented and enabled in the current Security state,
>> the physical counter uses a fixed physical offset of *zero* if any of
>> the following are true:
>> • CNTHCTL_EL2.ECV is 0.
>> • SCR_EL3.ECVEn is 0.
>> • HCR_EL2.{E2H, TGE} is {1, 1}."
>>
>> In VHE host hypervisor, E2H=TGE=1 hence ECV is disabled and Ptimer
>> interrupt is triggered based on PCount<63:0> - PCVal<63:0>
>>
>> Since cval is set by Guest as per offsetted PCounter value and pCount
>> is not subtracted by CNTPOFF when in VHE-L0, results in cval becoming
>> much lesser than physical counter(bumped up since CNTPOFF is zero) and
>> timer interrupt trigger condition is met falsely.
>>
>> There is no issue/impact on cval due to ECV, however it can be/is
>> manipulated to handle this on and off of CNTPOFF/ECV.
>>
>> IIUC, CNTPOFF and CNTVOFF are not same as per specification.
> 
> I owe you an apology. You are correct, and I was totally wrong.  I'm
> truly amazed how wrong we got this part of the architecture, but it is
> way too late for any change, and we'll have to live with it.
> 
Thanks Marc.

> Now, to the actual patch: I think the way you offset CVAL isn't
> great. You should never have to change it on entry, and you should
> instead read the correct value from memory. Then, save/restore of CVAL
> must be amended to always apply the offset. Can you give the hack
> below a go on your HW?

Sure, I will try this and update.

> 
> Thanks,
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index ea46b4e1e7a8..bb80fdd84676 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -55,11 +55,6 @@ static struct irq_ops arch_timer_irq_ops = {
>   	.get_input_level = kvm_arch_timer_get_input_level,
>   };
>   
> -static bool has_cntpoff(void)
> -{
> -	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
> -}
> -
>   static int nr_timers(struct kvm_vcpu *vcpu)
>   {
>   	if (!vcpu_has_nv2(vcpu))
> @@ -180,7 +175,7 @@ u64 kvm_phys_timer_read(void)
>   	return timecounter->cc->read(timecounter->cc);
>   }
>   
> -static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
> +void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>   {
>   	if (vcpu_has_nv2(vcpu)) {
>   		if (is_hyp_ctxt(vcpu)) {
> @@ -569,8 +564,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>   		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
>   		cval = read_sysreg_el0(SYS_CNTP_CVAL);
>   
> -		if (!has_cntpoff())
> -			cval -= timer_get_offset(ctx);
> +		cval -= timer_get_offset(ctx);
>   
>   		timer_set_cval(ctx, cval);
>   
> @@ -657,8 +651,7 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>   		cval = timer_get_cval(ctx);
>   		offset = timer_get_offset(ctx);
>   		set_cntpoff(offset);
> -		if (!has_cntpoff())
> -			cval += offset;
> +		cval += offset;
>   		write_sysreg_el0(cval, SYS_CNTP_CVAL);
>   		isb();
>   		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 9611b4eaf661..6e3d3e16563f 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -90,6 +90,20 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>   
>   	___activate_traps(vcpu, hcr);
>   
> +	if (has_cntpoff()) {
> +		struct timer_map map;
> +
> +		get_timer_map(vcpu, &map);
> +
> +		if (map.direct_ptimer == vcpu_ptimer(vcpu))
> +			val = __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
> +		else
> +			val = __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
> +
> +		isb();
> +		write_sysreg_s(val, SYS_CNTP_CVAL_EL0);
> +	}
> +
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_ELx_TTA;
>   	val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
> @@ -131,6 +145,23 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
>   
>   	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
>   
> +	if (has_cntpoff()) {
> +		struct timer_map map;
> +		u64 val, offset;
> +
> +		get_timer_map(vcpu, &map);
> +
> +		val = read_sysreg_s(SYS_CNTP_CVAL_EL0);
> +		if (map.direct_ptimer == vcpu_ptimer(vcpu))
> +			__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = val;
> +		else
> +			__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = val;
> +
> +		offset = read_sysreg_s(SYS_CNTPOFF_EL2);
> +		write_sysreg_s(val + offset, SYS_CNTP_CVAL_EL0);
> +		isb();
> +	}
> +
>   	/*
>   	 * ARM errata 1165522 and 1530923 require the actual execution of the
>   	 * above before we can switch to the EL2/EL0 translation regime used by
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index ea77a569a907..86a73ad1446a 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -82,6 +82,8 @@ struct timer_map {
>   	struct arch_timer_context *emul_ptimer;
>   };
>   
> +void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map);
> +
>   struct arch_timer_cpu {
>   	struct arch_timer_context timers[NR_KVM_TIMERS];
>   
> @@ -149,4 +151,9 @@ void kvm_timer_cpu_down(void);
>   /* CNTKCTL_EL1 valid bits as of DDI0476J.a */
>   #define CNTKCTL_VALID_BITS	(BIT(17) | GENMASK_ULL(9, 0))
>   
> +static inline bool has_cntpoff(void)
> +{
> +	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
> +}
> +
>   #endif
> 

Thanks,
Ganapat
Ganapatrao Kulkarni Aug. 24, 2023, 6:37 a.m. UTC | #6
On 22-08-2023 08:27 pm, Ganapatrao Kulkarni wrote:
> 
> 
> On 22-08-2023 05:46 pm, Marc Zyngier wrote:
>> On Thu, 17 Aug 2023 10:27:55 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>>
>>> Hi Marc,
>>>
>>> On 17-08-2023 01:57 pm, Marc Zyngier wrote:
>>>> [Fixing Christoffer's email address]
>>>
>>> Thanks.
>>>>
>>>> On Thu, 17 Aug 2023 07:03:14 +0100,
>>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>>
>>>>> As per FEAT_ECV, when HCR_EL2.{E2H, TGE} == {1, 1}, Enhanced Counter
>>>>> Virtualization functionality is disabled and CNTPOFF_EL2 value is 
>>>>> treated
>>>>> as zero. On VHE host, E2H and TGE are set, hence it is required
>>>>> to adjust CVAL by incrementing it by CNTPOFF_EL2 after guest
>>>>> exit to avoid false physical timer interrupts and also
>>>>> decrement/restore CVAL before the guest entry.
>>>>
>>>> No, this is wrong. Neither E2H nor TGE have any impact on writing to
>>>> CNTPOFF_EL2, nor does it have an impact on CNTP_CVAL_EL0. Just read
>>>> the pseudocode to convince yourself.
>>>>
>>>> CNTPOFF_EL2 is applied at exactly two points: when SW is reading
>>>> CNTPCT_EL0 from EL1 while {E2H,TGE}=={1, 0} and when the HW is
>>>> comparing CNTPCT_EL0 with the CNTP_CVAL_EL0. In both cases the offset
>>>> is subtracted from the counter. And that's the point where the running
>>>> EL matters. Which means that CNTPOFF_EL2 behaves exactly like
>>>> CNTVOFF_EL2. No ifs, no buts.
>>>
>>> As per ARM ARM (ARM DDI 0487J.a page D11-5989)
>>> "When FEAT_ECV is implemented, the CNTPOFF_EL2 register allows an
>>> offset to be applied to the physical counter, as viewed from EL1 and
>>> EL0, and to the EL1 physical timer. The functionality of this 64-bit
>>> register is affected by CNTHCTL_EL2.ECV."
>>>
>>> As per ARM ARM (ARM DDI 0487J.a page D19-7857)
>>> "When HCR_EL2.{E2H, TGE} == {1, 1} or SCR_EL3.{NS, EEL2} == {0, 0}, then
>>> Enhanced Counter Virtualization functionality is disabled."
>>>
>>> "The EL1 physical timer interrupt is triggered when ((PCount<63:0> -
>>> CNTPOFF_EL2<63:0>) - PCVal<63:0>) is greater than or equal to 0."
>>>
>>> As per ARM ARM (ARM DDI 0487J.a page D19-7938)
>>> "When EL2 is implemented and enabled in the current Security state,
>>> the physical counter uses a fixed physical offset of *zero* if any of
>>> the following are true:
>>> • CNTHCTL_EL2.ECV is 0.
>>> • SCR_EL3.ECVEn is 0.
>>> • HCR_EL2.{E2H, TGE} is {1, 1}."
>>>
>>> In VHE host hypervisor, E2H=TGE=1 hence ECV is disabled and Ptimer
>>> interrupt is triggered based on PCount<63:0> - PCVal<63:0>
>>>
>>> Since cval is set by Guest as per offsetted PCounter value and pCount
>>> is not subtracted by CNTPOFF when in VHE-L0, results in cval becoming
>>> much lesser than physical counter(bumped up since CNTPOFF is zero) and
>>> timer interrupt trigger condition is met falsely.
>>>
>>> There is no issue/impact on cval due to ECV, however it can be/is
>>> manipulated to handle this on and off of CNTPOFF/ECV.
>>>
>>> IIUC, CNTPOFF and CNTVOFF are not same as per specification.
>>
>> I owe you an apology. You are correct, and I was totally wrong.  I'm
>> truly amazed how wrong we got this part of the architecture, but it is
>> way too late for any change, and we'll have to live with it.
>>
> Thanks Marc.
> 
>> Now, to the actual patch: I think the way you offset CVAL isn't
>> great. You should never have to change it on entry, and you should
>> instead read the correct value from memory. Then, save/restore of CVAL
>> must be amended to always apply the offset. Can you give the hack
>> below a go on your HW?

I tried this and seems not working, this is due to timer save/restore 
are not called for some of the kvm_exit and entry paths(lighter switches).

I tried changing this patch like, Removed cval adjust from the kvm_entry 
and still cval is adjusted on kvm_exit and in timer_restore_state 
function, reduced cval by offset.

Please let me know, if this is not you intended to try?
If possible, please share the steps or pseudo code.

Thanks,
Ganapat

> 
> Sure, I will try this and update.
> 
>>
>> Thanks,
>>
>>     M.
>>
>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>> index ea46b4e1e7a8..bb80fdd84676 100644
>> --- a/arch/arm64/kvm/arch_timer.c
>> +++ b/arch/arm64/kvm/arch_timer.c
>> @@ -55,11 +55,6 @@ static struct irq_ops arch_timer_irq_ops = {
>>       .get_input_level = kvm_arch_timer_get_input_level,
>>   };
>> -static bool has_cntpoff(void)
>> -{
>> -    return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
>> -}
>> -
>>   static int nr_timers(struct kvm_vcpu *vcpu)
>>   {
>>       if (!vcpu_has_nv2(vcpu))
>> @@ -180,7 +175,7 @@ u64 kvm_phys_timer_read(void)
>>       return timecounter->cc->read(timecounter->cc);
>>   }
>> -static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>> +void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>>   {
>>       if (vcpu_has_nv2(vcpu)) {
>>           if (is_hyp_ctxt(vcpu)) {
>> @@ -569,8 +564,7 @@ static void timer_save_state(struct 
>> arch_timer_context *ctx)
>>           timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
>>           cval = read_sysreg_el0(SYS_CNTP_CVAL);
>> -        if (!has_cntpoff())
>> -            cval -= timer_get_offset(ctx);
>> +        cval -= timer_get_offset(ctx);
>>           timer_set_cval(ctx, cval);
>> @@ -657,8 +651,7 @@ static void timer_restore_state(struct 
>> arch_timer_context *ctx)
>>           cval = timer_get_cval(ctx);
>>           offset = timer_get_offset(ctx);
>>           set_cntpoff(offset);
>> -        if (!has_cntpoff())
>> -            cval += offset;
>> +        cval += offset;
>>           write_sysreg_el0(cval, SYS_CNTP_CVAL);
>>           isb();
>>           write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
>> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
>> b/arch/arm64/kvm/hyp/vhe/switch.c
>> index 9611b4eaf661..6e3d3e16563f 100644
>> --- a/arch/arm64/kvm/hyp/vhe/switch.c
>> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
>> @@ -90,6 +90,20 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>>       ___activate_traps(vcpu, hcr);
>> +    if (has_cntpoff()) {
>> +        struct timer_map map;
>> +
>> +        get_timer_map(vcpu, &map);
>> +
>> +        if (map.direct_ptimer == vcpu_ptimer(vcpu))
>> +            val = __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
>> +        else
>> +            val = __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
>> +
>> +        isb();
>> +        write_sysreg_s(val, SYS_CNTP_CVAL_EL0);
>> +    }
>> +
>>       val = read_sysreg(cpacr_el1);
>>       val |= CPACR_ELx_TTA;
>>       val &= ~(CPACR_EL1_ZEN_EL0EN | CPACR_EL1_ZEN_EL1EN |
>> @@ -131,6 +145,23 @@ static void __deactivate_traps(struct kvm_vcpu 
>> *vcpu)
>>       write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
>> +    if (has_cntpoff()) {
>> +        struct timer_map map;
>> +        u64 val, offset;
>> +
>> +        get_timer_map(vcpu, &map);
>> +
>> +        val = read_sysreg_s(SYS_CNTP_CVAL_EL0);
>> +        if (map.direct_ptimer == vcpu_ptimer(vcpu))
>> +            __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = val;
>> +        else
>> +            __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = val;
>> +
>> +        offset = read_sysreg_s(SYS_CNTPOFF_EL2);
>> +        write_sysreg_s(val + offset, SYS_CNTP_CVAL_EL0);
>> +        isb();
>> +    }
>> +
>>       /*
>>        * ARM errata 1165522 and 1530923 require the actual execution 
>> of the
>>        * above before we can switch to the EL2/EL0 translation regime 
>> used by
>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>> index ea77a569a907..86a73ad1446a 100644
>> --- a/include/kvm/arm_arch_timer.h
>> +++ b/include/kvm/arm_arch_timer.h
>> @@ -82,6 +82,8 @@ struct timer_map {
>>       struct arch_timer_context *emul_ptimer;
>>   };
>> +void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map);
>> +
>>   struct arch_timer_cpu {
>>       struct arch_timer_context timers[NR_KVM_TIMERS];
>> @@ -149,4 +151,9 @@ void kvm_timer_cpu_down(void);
>>   /* CNTKCTL_EL1 valid bits as of DDI0476J.a */
>>   #define CNTKCTL_VALID_BITS    (BIT(17) | GENMASK_ULL(9, 0))
>> +static inline bool has_cntpoff(void)
>> +{
>> +    return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
>> +}
>> +
>>   #endif
>>
> 
> Thanks,
> Ganapat
Marc Zyngier Aug. 28, 2023, 10:17 a.m. UTC | #7
On Thu, 24 Aug 2023 07:37:42 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> >> Now, to the actual patch: I think the way you offset CVAL isn't
> >> great. You should never have to change it on entry, and you should
> >> instead read the correct value from memory. Then, save/restore of CVAL
> >> must be amended to always apply the offset. Can you give the hack
> >> below a go on your HW?
> 
> I tried this and seems not working, this is due to timer save/restore
> are not called for some of the kvm_exit and entry paths(lighter
> switches).

Can you point me to such paths? Are you referring to the ECV handling
of the physical timer registers?

> 
> I tried changing this patch like, Removed cval adjust from the
> kvm_entry and still cval is adjusted on kvm_exit and in
> timer_restore_state function, reduced cval by offset.
> 
> Please let me know, if this is not you intended to try?
> If possible, please share the steps or pseudo code.

What I want to get to is that:

- on entry (TGE having been flipped to 0), the guest's CVAL is always
  reload from memory, because that's the absolute reference. We should
  never load anything else on the CPU.

- on exit (TGE having been flipped to 1), the guest's CVAL is stored
  as the one true value to memory, and the CPU's view is offset by the
  offset.

- the high-level save/restore helpers apply the offsets back and forth
  as if CNTPOFF didn't exist (because that's exactly the case if
  TGE=1).

Now, I'm pretty sure I'm still missing something, but the above is
roughly the scheme I'm trying to follow?

Thanks,

	M.
Ganapatrao Kulkarni Sept. 1, 2023, 12:15 p.m. UTC | #8
On 28-08-2023 03:47 pm, Marc Zyngier wrote:
> On Thu, 24 Aug 2023 07:37:42 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>>> Now, to the actual patch: I think the way you offset CVAL isn't
>>>> great. You should never have to change it on entry, and you should
>>>> instead read the correct value from memory. Then, save/restore of CVAL
>>>> must be amended to always apply the offset. Can you give the hack
>>>> below a go on your HW?
>>
>> I tried this and seems not working, this is due to timer save/restore
>> are not called for some of the kvm_exit and entry paths(lighter
>> switches).
> 
> Can you point me to such paths? Are you referring to the ECV handling
> of the physical timer registers?
> 
>>
>> I tried changing this patch like, Removed cval adjust from the
>> kvm_entry and still cval is adjusted on kvm_exit and in
>> timer_restore_state function, reduced cval by offset.
>>
>> Please let me know, if this is not you intended to try?
>> If possible, please share the steps or pseudo code.
> 
> What I want to get to is that:
> 
> - on entry (TGE having been flipped to 0), the guest's CVAL is always
>    reload from memory, because that's the absolute reference. We should
>    never load anything else on the CPU.
> 
> - on exit (TGE having been flipped to 1), the guest's CVAL is stored
>    as the one true value to memory, and the CPU's view is offset by the
>    offset.
> 
> - the high-level save/restore helpers apply the offsets back and forth
>    as if CNTPOFF didn't exist (because that's exactly the case if
>    TGE=1).
> 
> Now, I'm pretty sure I'm still missing something, but the above is
> roughly the scheme I'm trying to follow?
> 

Sorry for the confusion. It was my bad, I did some mistake while trying 
your hack patch. Based on your patch and above explanation I have 
created below diff which is working and clean.

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 4eb601e7de50..f22cc733efb1 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -132,6 +132,11 @@ static __always_inline bool has_vhe(void)
  		return cpus_have_final_cap(ARM64_HAS_VIRT_HOST_EXTN);
  }

+static __always_inline bool has_cntpoff(void)
+{
+	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
+}
+
  static __always_inline bool is_protected_kvm_enabled(void)
  {
  	if (is_vhe_hyp_code())
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 75bddab3224f..de625bc7c25c 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -55,11 +55,6 @@ static struct irq_ops arch_timer_irq_ops = {
  	.get_input_level = kvm_arch_timer_get_input_level,
  };

-static bool has_cntpoff(void)
-{
-	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
-}
-
  static int nr_timers(struct kvm_vcpu *vcpu)
  {
  	if (!vcpu_has_nv(vcpu))
@@ -566,10 +561,7 @@ static void timer_save_state(struct 
arch_timer_context *ctx)
  	case TIMER_HPTIMER:
  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
  		cval = read_sysreg_el0(SYS_CNTP_CVAL);
-
-		if (!has_cntpoff())
-			cval -= timer_get_offset(ctx);
-
+		cval -= timer_get_offset(ctx);
  		timer_set_cval(ctx, cval);

  		/* Disable the timer */
@@ -655,8 +647,7 @@ static void timer_restore_state(struct 
arch_timer_context *ctx)
  		cval = timer_get_cval(ctx);
  		offset = timer_get_offset(ctx);
  		set_cntpoff(offset);
-		if (!has_cntpoff())
-			cval += offset;
+		cval += offset;
  		write_sysreg_el0(cval, SYS_CNTP_CVAL);
  		isb();
  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
@@ -960,6 +951,59 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
  		kvm_timer_blocking(vcpu);
  }

+static void ptimer_cval_save(struct arch_timer_context *ctx, u64 offset)
+{
+	unsigned long flags;
+	u64 cval;
+
+	local_irq_save(flags);
+	cval = read_sysreg_el0(SYS_CNTP_CVAL);
+	timer_set_cval(ctx, cval);
+	cval += offset;
+	write_sysreg_el0(cval, SYS_CNTP_CVAL);
+	isb();
+	local_irq_restore(flags);
+}
+
+static void ptimer_cval_restore(struct arch_timer_context *ctx, u64 offset)
+{
+	unsigned long flags;
+	u64 cval;
+
+	local_irq_save(flags);
+	cval = timer_get_cval(ctx);
+	write_sysreg_el0(cval, SYS_CNTP_CVAL);
+	isb();
+	local_irq_restore(flags);
+}
+
+void kvm_ptimer_cval_save_restore(struct kvm_vcpu *vcpu, bool save)
+{
+	struct timer_map map;
+	struct arch_timer_cpu *timer;
+	struct arch_timer_context *ctxp;
+	u64 offset;
+
+	get_timer_map(vcpu, &map);
+	ctxp = map.direct_ptimer;
+
+	if (unlikely(ctxp == NULL))
+		return;
+	
+	offset = timer_get_offset(ctxp);
+	if (!offset)
+		return;
+
+	timer = vcpu_timer(ctxp->vcpu);
+	if (!timer->enabled || !ctxp->loaded)
+		return;
+
+	if (save)
+		ptimer_cval_save(ctxp, offset);
+	else
+		ptimer_cval_restore(ctxp, offset);
+}
+
  void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
  {
  	/*
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
b/arch/arm64/kvm/hyp/vhe/switch.c
index 561cb53e19ce..097fcaf7b208 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -100,6 +100,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
  		hcr |= vhcr_el2;
  	}

+	/* Restore CVAL */
+	if (has_cntpoff())
+		kvm_ptimer_cval_save_restore(vcpu, false);
+
  	___activate_traps(vcpu, hcr);

  	val = read_sysreg(cpacr_el1);
@@ -141,6 +145,15 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)

  	___deactivate_traps(vcpu);

+	/*
+	 * For VHE Host, HCR_EL2.{E2H, TGE} is {1, 1}, FEAT_ECV
+	 * is disabled and CNTPOFF_EL2 value is treated as zero.
+	 * Hence, need to save guest written CVAL in memory and
+	 * increment PTIMER's CVAL by CNTPOFF to avoid false interrupt.
+	 */
+	if (has_cntpoff())
+		kvm_ptimer_cval_save_restore(vcpu, true);
+
  	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);

  	/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ea77a569a907..ce3f4d9e7dd4 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -117,6 +117,7 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);

  void kvm_timer_init_vhe(void);
+void kvm_ptimer_cval_save_restore(struct kvm_vcpu *vcpu, bool save);

  #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
  #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])


Thanks,
Ganapat

> Thanks,
> 
> 	M.
>
diff mbox series

Patch

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 98b0e8ac02ae..be609b12827d 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -955,6 +955,38 @@  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 		kvm_timer_blocking(vcpu);
 }
 
+static void ptimer_cval_adjust(struct arch_timer_context *ctx, bool inc)
+{
+	struct arch_timer_cpu *timer = vcpu_timer(ctx->vcpu);
+	unsigned long flags;
+	u64 cval, offset;
+
+	if (!timer->enabled || !ctx->loaded)
+		return;
+
+	local_irq_save(flags);
+	offset = timer_get_offset(ctx);
+	if (offset) {
+		cval = read_sysreg_el0(SYS_CNTP_CVAL);
+		if (inc)
+			cval += offset;
+		else
+			cval -= offset;
+		write_sysreg_el0(cval, SYS_CNTP_CVAL);
+		isb();
+	}
+	local_irq_restore(flags);
+}
+
+void kvm_ptimer_cval_adjust(struct kvm_vcpu *vcpu, bool inc)
+{
+	struct timer_map map;
+
+	get_timer_map(vcpu, &map);
+	if (map.direct_ptimer)
+		ptimer_cval_adjust(map.direct_ptimer, inc);
+}
+
 void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
 {
 	/*
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 561cb53e19ce..0cdcefc1351d 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -100,6 +100,10 @@  static void __activate_traps(struct kvm_vcpu *vcpu)
 		hcr |= vhcr_el2;
 	}
 
+	/* Decrement/Restore CVAL by CNTPOFF. */
+	if (has_cntpoff())
+		kvm_ptimer_cval_adjust(vcpu, false);
+
 	___activate_traps(vcpu, hcr);
 
 	val = read_sysreg(cpacr_el1);
@@ -141,6 +145,15 @@  static void __deactivate_traps(struct kvm_vcpu *vcpu)
 
 	___deactivate_traps(vcpu);
 
+	/*
+	 * For VHE Host, HCR_EL2.{E2H, TGE} is {1, 1}, hence FEAT_ECV
+	 * is disabled and CNTPOFF_EL2 value is treated as zero.
+	 * Need to increment CVAL for non-zero CNTPOFF to avoid
+	 * false PTIMER interrupt.
+	 */
+	if (has_cntpoff())
+		kvm_ptimer_cval_adjust(vcpu, true);
+
 	write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2);
 
 	/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ea77a569a907..3ce6f02a0d9b 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -117,6 +117,7 @@  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
+void kvm_ptimer_cval_adjust(struct kvm_vcpu *vcpu, bool inc);
 
 #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])